how to get javascript code too with the actual source with Html Agility Pack

c# html-agility-pack javascript parsing xpath

Question

i am getting source of a website using Html Agility pack which is different than the code when i inspect with firebug.i have searched many things but still not getting clear of what i should do.Source is different than the code when i inspect please tell me how to get javascript code too with that Html. Even when i disable javascript in my browser i still cannot get the Javascript code along the source. i am using

string url="";
HtmlDocument doc = new HtmlDocument();
                WebClient client = new WebClient();
                html = client.DownloadString(url);
                doc.LoadHtml(html);

to get source tell me if i should need a request and response method to get JS code too.

1
2
4/2/2016 1:02:52 PM

Accepted Answer

To expand on @alecxe answer, you can use Selenium* to load your target page like a real browser would do, and then pass the result to HtmlAgilityPack for further processing :

using OpenQA.Selenium;

.....

IWebDriver driver = new PhantomJS.PhantomJSDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

alternatively, you can just run your query (XPath or CSS selector) using Selenium directly, for example :

var result = driver.FindElements(By.XPath("your query"));

//print HTML of the returned elements
foreach (var item in result)
{
    Console.WriteLine(item.GetAttribute("outerHTML"));
}

*) Need to download Selenium first, as well as the driver i.e PhantomJS, Firefox, etc. Selenium can be installed to your project easily from NuGet.

2
5/23/2017 11:50:39 AM

Popular Answer

For that you would need a real browser. Consider automating a browser (which can be headless - see PhantomJS) with the help of selenium.

See also:



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow