Use explorer.document as source HtmlDocument for HtmlAgilityPack

c# html-agility-pack internet-explorer microsoft.mshtml mshtml

Question

I want to use currently loaded webpage in internet explorer as HtmlDocument in HtmlAgilityPack. I am using explorer document through mshtml as COM object.

mshtml.HTMLDocument doc = explorer.Document as mshtml.HTMLDocument;

Then I've tried to convert it to HtmlDocument which is using in HtmlAgilityPack

HtmlAgilityPack.HtmlDocument hdoc = (HtmlAgilityPack.HtmlDocument)doc;

But it's not working due to invalid cast operation. Exception message is shown below.

Exception Message

Anyhow I want to use currently loaded webpage as source to htmlagilitypack, I know that I can use HtmlWeb provided by htmlagility pack and load current url but I want to highlight elements which are in the loaded page (elements found using htmlagilitypack) I guess it cannot be done through that kind of implementation. Any ideas to implement this any support will be great. thanks.

1
2
8/26/2014 7:30:15 AM

Accepted Answer

Of course you can't cast between mshtml.HTMLDocument and HtmlAgilityPack.HtmlDocument, they're completely distinct classes from different libraries, where one is purely managed and the other is a managed COM wrapper.

What you can do is grab the HTML from the mshtml.HTMLDocument and load it into the Agility Pack.

Probably something along these lines:

  mshtml.IHTMLDocument3 sourceDoc = (mshtml.IHTMLDocument3) explorer.Document;  
  string documentContents = sourceDoc.documentElement.outerHTML; 

  HtmlAgilityPack.HtmlDocument targetDoc = new HtmlAgilityPack.HtmlDocument();

  targetDoc.LoadHtml(documentContents);

You could also use the IPersistStream and then call the Save method, pass a MemoryStream and then feed that to the HtmlAgilityPack.

4
6/11/2015 9:40:17 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow