How to load dynamically generated webpage?

c# data-scrubbing html html-agility-pack

Question

I am trying to load the webpage, http://www.artstation.com/artist/nicotine, so I can scrub the page, unfortunately the page seems to be generated via code so the tags that I am looking for aren't available.

Loading it with the following isn't working, as it only loads the source javascript, not the content it generates:

HtmlWeb htmlWeb = new HtmlWeb();
imagepage = htmlWeb.Load(http://www.artstation.com/artist/nicotine);

How can I load the page the page that is shown in the browser so that I can scrub it for the tags?

Popular Answer

You can not use HtmlAgilityPack for this. When HAP is asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the JavaScript on it hasn't yet done anything.

There is a work around for this. You can use selenium or phantomJs to get the content of dynamically generated tags. These tools have browser stack and it will execute you the JavaScript. You can find many other tools like this and plenty of examples.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why