Remove class name from tag nodes using HtmlAgilityPack

c# html html-agility-pack xpath


I need to get rid of specific class names from html, for example:

<table class="removeme"></table>

I need a code snipped who can remove specified class "removeme" and html after cleaning will look like this:


Also bear in mind that passed html can contain something like:

<table class="removeme leaveme"></table>

and after cleaning it should appear as:

<table class="leaveme"></table>
9/2/2014 11:55:48 AM

Accepted Answer

You can try to address this in two steps, first get all nodes having only 'removeme` class attribute, then remove the entire class attribute from them :


Then in the next step, get all nodes having removeme class and some other classes, then strip out removeme from the class attribute :

     contains(concat(' ', normalize-space(@class), ' '), ' removeme ')

the first condition in the XPath above means get all nodes that isn't processed in the step 1, and the 2nd condition is the equivalent XPath for css selector .removeme

Here is the complete console example :

var xml = @"<root>
    <table class=""removeme""></table>
    <table class=""removeme leaveme""></table>
    <table class="" removeme ""></table>
var doc = new HtmlDocument();
var removemeOnly = doc.DocumentNode.SelectNodes("//*[normalize-space(@class)='removeme']");
foreach (HtmlNode node in removemeOnly)
var containsRemoveme =
    doc.DocumentNode.SelectNodes("//*[normalize-space(@class)!='removeme' and contains(concat(' ', normalize-space(@class), ' '), ' removeme ')]");
foreach (HtmlNode node in containsRemoveme)
    node.Attributes["class"].Value = node.Attributes["class"].Value.Replace("removeme", "");
//print formatted HTML output (don't use this for non XML-compliant HTML)
5/23/2017 10:33:35 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow