how to get javascript code too with the actual source with Html Agility Pack

c# html-agility-pack javascript parsing xpath

Question

i am getting source of a website using Html Agility pack which is different than the code when i inspect with firebug.i have searched many things but still not getting clear of what i should do.Source is different than the code when i inspect please tell me how to get javascript code too with that Html. Even when i disable javascript in my browser i still cannot get the Javascript code along the source. i am using

string url="";
HtmlDocument doc = new HtmlDocument();
                WebClient client = new WebClient();
                html = client.DownloadString(url);
                doc.LoadHtml(html);

to get source tell me if i should need a request and response method to get JS code too.

Accepted Answer

To expand on @alecxe answer, you can use Selenium* to load your target page like a real browser would do, and then pass the result to HtmlAgilityPack for further processing :

using OpenQA.Selenium;

.....

IWebDriver driver = new PhantomJS.PhantomJSDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

alternatively, you can just run your query (XPath or CSS selector) using Selenium directly, for example :

var result = driver.FindElements(By.XPath("your query"));

//print HTML of the returned elements
foreach (var item in result)
{
    Console.WriteLine(item.GetAttribute("outerHTML"));
}

*) Need to download Selenium first, as well as the driver i.e PhantomJS, Firefox, etc. Selenium can be installed to your project easily from NuGet.


Popular Answer

For that you would need a real browser. Consider automating a browser (which can be headless - see PhantomJS) with the help of selenium.

See also:




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why