Duplicating HtmlNode in HtmlAgilityPack?

c# html-agility-pack xpath

Question

I'm using HTML Agility Pack to do two different things on the same page.
For the first one I need to remove element like script, style etc. However for the second one, I must keep all of the elements.

Since I can't do the second part before the first one, I'm looking for a way to duplicate the object at first, so I can save all of the elements for the second part. That is the code I tried, but for some reson I do not get the nodes inside it.

        HtmlDocument HTMLdoc = new HtmlDocument();
        HTMLdoc.LoadHtml(sFetch);

        //duplicate document node
        var webPage = HtmlNode.CreateNode("<html></html>");
        webPage.CopyFrom(HTMLdoc.DocumentNode,true);

Another way I've thought of is to invert the xpath that select all the elements I with to remove. so I'll be able to select just them without really removing elements from the object. But I can't figure out how to use the XPath "not()" function to invert my query. that's my XPath query:

"//script | //style | //iframe | //select | //textarea | //comment() | //a[@href]"

Thanks for your time and help :)

Popular Answer

I am doing something similar. I had to get this info and then convert it to XML. Here is what you need:

        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.LoadHtml(sfetch);

        HtmlNodeCollection page = htmlDoc.DocumentNode.SelectNodes("//table");//whatever tags your are looking for in your doc

        foreach (HtmlNode value in page)
        {
            richTxtboxFilteredHTML.Text += value.InnerText;
        }

If you're going to process this further, you will need to keep referencing each HtmlNode.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why