I'm using HTML Agility Pack to do two different things on the same page.
For the first one I need to remove element like script, style etc.
However for the second one, I must keep all of the elements.
Since I can't do the second part before the first one, I'm looking for a way to duplicate the object at first, so I can save all of the elements for the second part. That is the code I tried, but for some reson I do not get the nodes inside it.
HtmlDocument HTMLdoc = new HtmlDocument();
HTMLdoc.LoadHtml(sFetch);
//duplicate document node
var webPage = HtmlNode.CreateNode("<html></html>");
webPage.CopyFrom(HTMLdoc.DocumentNode,true);
Another way I've thought of is to invert the xpath that select all the elements I with to remove. so I'll be able to select just them without really removing elements from the object. But I can't figure out how to use the XPath "not()" function to invert my query. that's my XPath query:
"//script | //style | //iframe | //select | //textarea | //comment() | //a[@href]"
Thanks for your time and help :)
I am doing something similar. I had to get this info and then convert it to XML. Here is what you need:
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(sfetch);
HtmlNodeCollection page = htmlDoc.DocumentNode.SelectNodes("//table");//whatever tags your are looking for in your doc
foreach (HtmlNode value in page)
{
richTxtboxFilteredHTML.Text += value.InnerText;
}
If you're going to process this further, you will need to keep referencing each HtmlNode.