Duplicating HtmlNode in HtmlAgilityPack?

c# html-agility-pack xpath

Question

I'm using HTML Agility Pack to do two separate tasks on a single page.
I have to take out elements like writing and style from the original one. I have to maintain every component, however, for the second one.

Finding a means to replicate the item first will allow me to store all of the components for the second portion because I can't complete the second part before completing the first. I tried the code, but for some reason I am unable to see the nodes inside of it.

        HtmlDocument HTMLdoc = new HtmlDocument();
        HTMLdoc.LoadHtml(sFetch);

        //duplicate document node
        var webPage = HtmlNode.CreateNode("<html></html>");
        webPage.CopyFrom(HTMLdoc.DocumentNode,true);

I also considered inverting the xpath that selects all the components I want to eliminate. I'll be able to choose only those components without really taking anything away from the entity. But I'm having trouble understanding how to invert my query using the XPath "not()" method. that's the XPath I used.

"//script | //style | //iframe | //select | //textarea | //comment() | //a[@href]"

I appreciate your time and assistance.

1
1
8/23/2012 2:35:07 PM

Popular Answer

I'm carrying out a similar action. I had to get this data and then XMLize it. This is what you require:

        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.LoadHtml(sfetch);

        HtmlNodeCollection page = htmlDoc.DocumentNode.SelectNodes("//table");//whatever tags your are looking for in your doc

        foreach (HtmlNode value in page)
        {
            richTxtboxFilteredHTML.Text += value.InnerText;
        }

Continue referencing each HtmlNode if you want to handle this further.

0
9/18/2013 4:37:33 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow