Duplicating HtmlNode in HtmlAgilityPack?

c# html-agility-pack xpath


I'm using HTML Agility Pack to do two different things on the same page.
For the first one I need to remove element like script, style etc. However for the second one, I must keep all of the elements.

Since I can't do the second part before the first one, I'm looking for a way to duplicate the object at first, so I can save all of the elements for the second part. That is the code I tried, but for some reson I do not get the nodes inside it.

        HtmlDocument HTMLdoc = new HtmlDocument();

        //duplicate document node
        var webPage = HtmlNode.CreateNode("<html></html>");

Another way I've thought of is to invert the xpath that select all the elements I with to remove. so I'll be able to select just them without really removing elements from the object. But I can't figure out how to use the XPath "not()" function to invert my query. that's my XPath query:

"//script | //style | //iframe | //select | //textarea | //comment() | //a[@href]"

Thanks for your time and help :)

8/23/2012 2:35:07 PM

Popular Answer

I am doing something similar. I had to get this info and then convert it to XML. Here is what you need:

        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

        HtmlNodeCollection page = htmlDoc.DocumentNode.SelectNodes("//table");//whatever tags your are looking for in your doc

        foreach (HtmlNode value in page)
            richTxtboxFilteredHTML.Text += value.InnerText;

If you're going to process this further, you will need to keep referencing each HtmlNode.

9/18/2013 4:37:33 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow