How can I scrape a table that is created with JavaScript in c#

c# html-agility-pack html-table webclient

Question

Using HtmlAgilityPack, I'm attempting to get a table from the website https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/.

My code is currently

WebClient webClient = new WebClient();
        string page = webClient.DownloadString("https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/");

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(page);

        List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='list_result Result']")
                    .Descendants("tr")
                    .Skip(1)
                    .Where(tr => tr.Elements("td").Count() > 1)
                    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                    .ToList();

My issue is that the website uses JavaScript to build the table, and when I attempt to read it, it produces a null exception since the webpage indicates that JavaScript must be enabled.

I also tried using the "GET" approach.

 string Url = "https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
            WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
            myResponse.Close();

with same outcomes. I have already made registry changes and enabled JavaScript in Internet Explorer.

if (Environment.Is64BitOperatingSystem)
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Wow6432Node\\Microsoft\\Internet Explorer\\MAIN\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
    else  //For 32 bit machine
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Microsoft\\Internet Explorer\\Main\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);

I can see the website without any issues if I use a WebBrowser component, but I still can't get the table to show.

1
1
6/24/2018 6:09:33 PM

Accepted Answer

In any browser, F12 is a buddy of yours.

You may see all the information in this file by selecting the Network tab:

https://www.belastingdienst.nl/data/douane_wisselkoersen/wks.douane.wisselkoersen.dd201806.xml

(I assume that a URL with the prefix *.dd201807.xml will have the data for July 2018)

There is no need to utilize HTMLAgilityPack; instead, you must use C# to do a GET on that URL and parse it as XML. To choose the correct URL, you must first create the current year by joining it with the current month.

I cannot make it any more amusing!

2
6/24/2018 6:05:56 PM

Popular Answer

JavaScript cannot be run by WebClient since it is a http client and not a web browser. A headless web browser is required. For a list of headless web browsers, go to this page. However, I have not tested any of them, therefore I am unable to provide any recommendations at this time.

C# (.NET) headless browser



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow