Html-Agility-Pack not loading the page with full content?

asp.net html-agility-pack html-parsing scrape web-scraping

Question

i am using Html Agility Pack to fetch data from website(scrapping)

My problem is the website from i am fetching the data is load some of the content after few seconds of page load.

SO whenever i am trying to read the particular data from particular Div its giving me null.

but in var page i just not getting the division reviewBox..becuase its not loaded yet.

public void FetchAllLinks(String Url)
{
    Url = "http://www.tripadvisor.com/";
    HtmlDocument page = new HtmlWeb().Load(Url);

    var link_list= page.DocumentNode.SelectNodes("//div[@class='reviewBox']");

    foreach (var link in link_list)
    {
        htmlpage.InnerHtml = link.InnerHtml;
    }
}

so can anyone please tell me how to delay the request that

HtmlDocument page = new HtmlWeb().Load(Url);

will load the full data in page varibale

Popular Answer

It's not about delaying the request. That node is populated by javascript using the DOM and the Html Agility Pack is the wrong tool for that requirement (it isn't a web engine at all, it only loads the base Html).

When I need to get at stuff that requires a full web engine to parse, I typically use WatiN. It's designed to help unit test actual web pages, but that means it allows programmatic access to web pages through a given browser engine and will load the full document. It comes with IE or Firefox drivers out of the box and I vaguely recall that Chrome wasn't hard to use, either.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why