WPF C#, parse webbrowser content in htmlAgilityPack

c# html-agility-pack webbrowser-control wpf

Question

I need to scrap some data from a website, I create a webbrowser to allow the user to make login and use the search tool and once he searched and got the list with the results I want to have the ability to get this data and perform further analysis and have offline access.

As I said the easiest approach for me is using a webbrowser, it works out of the box, login works, surfing works, and then when I reach the appropriated page I have the webBrowser.Document witch is a mshtml.HTMLDocumentClass (if I´m correct). But htmlAgilityPack request a HtmlDocument

What's the easiest way to parse from one to the other? Please notice the webbroser is WPF webbrowser.

1
0
5/7/2016 10:04:17 AM

Popular Answer

No temporal extra files needed, just parsing from the right class.

string html = (webBrowser.Document as HTMLDocument).documentElement.innerHTML;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

from here on .. happy scrapping :)

4
5/7/2016 10:14:28 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow