I've got an HTML file with a
<script> in it:
<html> <script type="application/custom+xml"> <my><xml><goes><here/></goes></xml></my> </script> </html>
I parse it with HTML Agility Pack and then convert it to XML.
HtmlDocument html; html.OptionOutputAsXml = true; html.Save(stream); ... XDocument xml = XDocument.Load(stream);
I then want to use LINQ-to-XML to look at the contents of the
script tag which should contain my XML as CDATA. But HTML Agility Pack messes it up somehow and I end up with this escaped XML:
<html> <script type="application/custom+xml"> //<![CDATA[ <my><xml><goes><here/></goes></xml></my> //]]>// </script> </html>
Does anyone know how I can tell HTML Agility Pack not to escape the contents of the
That's rather easy, by default the AgilityPack is set to treat script tags content as CData, this is done in the static constructor of the HtmlNode class like so:
To change this one doesn't have to modify the AgilityPack, all that's needed is one thing before your code, or just once when your program starts
Just add that before your code, like that it works for me.