Use explorer.document as source HtmlDocument for HtmlAgilityPack

c# html-agility-pack internet-explorer microsoft.mshtml mshtml

Question

I want to use currently loaded webpage in internet explorer as HtmlDocument in HtmlAgilityPack. I am using explorer document through mshtml as COM object.

mshtml.HTMLDocument doc = explorer.Document as mshtml.HTMLDocument;

Then I've tried to convert it to HtmlDocument which is using in HtmlAgilityPack

HtmlAgilityPack.HtmlDocument hdoc = (HtmlAgilityPack.HtmlDocument)doc;

But it's not working due to invalid cast operation. Exception message is shown below.

Exception Message

Anyhow I want to use currently loaded webpage as source to htmlagilitypack, I know that I can use HtmlWeb provided by htmlagility pack and load current url but I want to highlight elements which are in the loaded page (elements found using htmlagilitypack) I guess it cannot be done through that kind of implementation. Any ideas to implement this any support will be great. thanks.

Accepted Answer

Of course you can't cast between mshtml.HTMLDocument and HtmlAgilityPack.HtmlDocument, they're completely distinct classes from different libraries, where one is purely managed and the other is a managed COM wrapper.

What you can do is grab the HTML from the mshtml.HTMLDocument and load it into the Agility Pack.

Probably something along these lines:

  mshtml.IHTMLDocument3 sourceDoc = (mshtml.IHTMLDocument3) explorer.Document;  
  string documentContents = sourceDoc.documentElement.outerHTML; 

  HtmlAgilityPack.HtmlDocument targetDoc = new HtmlAgilityPack.HtmlDocument();

  targetDoc.LoadHtml(documentContents);

You could also use the IPersistStream and then call the Save method, pass a MemoryStream and then feed that to the HtmlAgilityPack.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why