What's faster? I just made a web scraper that uses HTML Agility pack and it's consuming massive amounts of memory.
Profiling it with a memory profiler, I found that the HTMLDocument, HTMLNode, etc, instances are taking up the most amount of memory.
I feel like maybe it would be faster and more efficient to use regex, am I wrong?
A reg-ex will be a lot faster than html agilty pack.
But you should remember that html need not always be well formed. Searching the correct data you want using only reg-ex may fail. Browsers are very forgiving about mistakes.
Agility pack is a great tool. It provides a lot of features for that memory it is consuming.
Depending on what exactly you do it really could be possible to speed things up and free some mem using regex. The question is - how rigid and well-formed are the pages you are extracting data from. Regex is much more easily confused by perfectly valid, but unexpected, HTML constructs that you might encounter in the wild.