how can i get html that is generated via AJAX with HTML Agility Pack?

ajax asp.net c# html html-agility-pack

Question

I am trying parse Web Page. part of that page generated via AJAX, WebClient.DownloadString I can get whole HTML except that code which is generated via AJAX? can you someone help me please?

My code is:

var client = new WebClient();
client .Headers.Add(HttpRequestHeader.UserAgent, "UserAgent,Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1");
client.Headers.Add(HttpRequestHeader.Cookie, "USER_PW=xxxxxxxxx; PHPSESSID=xxxxxxxxxxxxxxxxxx");
var html = client.DownloadString("xxxxxxxxxx");

I need list of vacancies...

Accepted Answer

its possible to get the content which is generated via AJAX But its not straight forward task, All you get is the page source (the same which you can see when you right click and say view page source)

To get the ajax content you must note down the url that the AJAX call hits and then make another call to this url to get the content. You can get this if you inspect the Network Tab of the developer window in any browser or even by looking at the javascript code within.

Disadvantages: This also means you will just get the response of the AJAX call but what if the javascript is manipulating this response (like creating a table out of json response) . This you might have to manually do it on your end.

So it also means you will end up in coding the same logic as in the javascript to get the result HTML, And this seems a lot of pain and too many holes for error.

Advantage: If all you are concerned with is only the data (like data from the website database) in the HTML and not the exact HTML itself, Then this approach will work for you.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why