get all the divs ids on a html page using Html Agility Pack

c# html-agility-pack

Question

How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.

<p>
    <div class='myclass1'>
        <div id='f'>
        </div>  
        <div id="myclass2">
            <div id="my"><div id="h"></div><div id="b"></div></div>
        </div>
    </div>
</p>

Code:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);    
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div"); 

How do I get collection of all divs ids?

Accepted Answer

If you just want the ID's, you can get a collection of those id attribute nodes instead of getting a collection of the div element nodes. For instance:

List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
    ids.Add(node.InnerText);
}

This will skip the div elements that don't have an ID, such as the <div class='myclass1'> element in your example.

"//div/@id" is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.

  • // means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.
  • div is an element name we want to match. So, in this case, we are telling it to find all div elements anywhere in the document.
  • / indicates that you want a child node. In this case the id attribute is a child of the div element, so first we say we want the div element, then we need the forward slash to say we want one of the div element's child nodes.
  • @id means we want to find all the id attributes. The @ symbol indicates that it is an attribute name instead of an element name.

Popular Answer

Yo can get the collection of div by passing xpath syntax

Like this

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

    htmlDoc.OptionFixNestedTags=true;

    htmlDoc.Load(filePath);

 foreach(HtmlNode div doc.DocumentElement.SelectNodes("//div"))
 {
///.. code here
 }



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why