html agility pack is returning javascript code except the actual Html

c# html-agility-pack javascript parsing

Question

i want to get the links using c# console from a website using html agility pack but there is java script code written in li and href tag why java script changes code on click i don't know please tell me the solution how t get actual code

<li onmouseover="activate_menu('top-menu-61', 61); void(0);" onmouseout="deactivate_menu('top-menu-61', 61);"><a href="javascript:void();

i can just see this in my li and a tag,how to resolve this and get actual html so i can get links furthur

Popular Answer

Try using browser automation tools like Selenium WebDriver to generate a webpage fully, utilizing a real browser, before passing it to HtmlAgilityPack for parsing. Using Selenium should be fairly easy as exemplified below. You only need to make sure that all the needed tools (Selenium library and browser driver of choice) are installed properly beforehand :

// Initialize the Chrome Driver (or any other supported browser)
using (var driver = new ChromeDriver())
{
    // open the target page
    driver.Navigate().GoToUrl("the_targt_page_url_here");

    //maybe add selenium waits if needed, 
    //to wait until certain element appear in the page

    //pass the HTML page to HAP's HtmlDocument
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(driver.PageSource);
}

Selenium also provides ways to locate elements within a page, so it is possible to replace HAP completely with Selenium, if you want.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why