Remove all classes and ids from parsed HTML with HtmlAgilityPack

c# html html-agility-pack

Question

I use HtmlAgilityPack for parsing some html page, I extract html tags from this page like this:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

in returned html each tag contain class and id, I want to remove all id-s and all class how I can to do this?

Accepted Answer

Maybe you should check this link: link.

As far as I can, tell when you have HtmlNode you can use its property Attributes. This collection has method Remove(string) that receive name of attribute that you want to remove. Well, I used it like this in one small project. I am not sure if this helps you.

So basically:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

foreach(var node in all_text)
{
   node.Attributes.Remove("class");
   node.Attributes.Remove("id");
} 



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why