Remove all classes and ids from parsed HTML with HtmlAgilityPack

c# html html-agility-pack

Question

I parse various HTML pages using HtmlAgilityPack, and I extract the following HTML tags:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

I want to delete all ids and all classes from the output html, but how can I accomplish this? Each element has a class and an id.

1
3
3/18/2015 6:00:34 PM

Accepted Answer

You may want to click on this link: link.

As far as I can determine, you can utilize HtmlNode's property Attributes when you have it. The Delete(string) function of this collection accepts the name of the attribute you want to remove. Well, I did utilize it in this manner for a little assignment. I'm not sure whether this is of use to you.

Basically then:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

foreach(var node in all_text)
{
   node.Attributes.Remove("class");
   node.Attributes.Remove("id");
} 
5
3/18/2015 6:35:17 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow