Html Agility Pack - Remove Tags by ID Or Class

c# html-agility-pack

Question

Here is my simplified HTML:

<html>
  <body>
    <div id="mainDiv">
       <div id="divToRemove"></div>
       <div id="divToKeep"></div>
       <div class="divToRemove"></div>
       <div class="divToRemove"></div>
    </div>
  </body>
</html>

I want to remove the divs with ID or class named "divToRemove" and then I want to select only the div called "mainDiv" (in a HtmlNode).

The results should be:

   <div id="mainDiv">
       <div id="divToKeep"></div>
   </div>

How can i do that using Html Agility Pack?

Thanks!

Accepted Answer

The following code is a adapted from this Html Agility Pack forum page to fit your needs. Essentially, we will grab all divs and then loop through them and check their class or their id for a match. If it's there remove it.

var divs = htmldoc.DocumentNode.SelectNodes("//div");
if (divs != null)
{
    foreach (var tag in divs)
    {
        if (tag.Attributes["class"] != null && string.Compare(tag.Attributes["class"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0)
        {
            tag.Remove();
        } else if(tag.Attributes["id"] != null && string.Compare(tag.Attributes["id"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0) {
            tag.Remove();
        }
    }
}

You can also combine these if statements into one large if statement, but I thought this read better for the answer.

Finally, select the node you were looking for...

var mainDiv = htmldoc.DocumentNode.SelectSingleNode("//div[@id='mainDiv']");

Popular Answer

Personally, I prefer to use the Linq methods of HtmlAgilityPack. The select will be long, but relatively straightforward—just select the nodes with the right id and/or class and then call the Remove() method on it.

foreach (var node in doc.DocumentNode.Descendants("div")
    .Where(n => n.Id.Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase) 
        || n.GetAttributeValue("class", string.Empty).Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase)))
    node.Remove();
HtmlNode mainNode = doc.DocumentNode.Descendants("div").Where(n => n.Id.Equals("mainDiv", StringComparison.InvariantCultureIgnoreCase).First();


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why