How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.
<p>
<div class='myclass1'>
<div id='f'>
</div>
<div id="myclass2">
<div id="my"><div id="h"></div><div id="b"></div></div>
</div>
</div>
</p>
Code:
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div");
How do I get collection of all divs ids?
If you just want the ID's, you can get a collection of those id
attribute nodes instead of getting a collection of the div
element nodes. For instance:
List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
ids.Add(node.InnerText);
}
This will skip the div
elements that don't have an ID, such as the <div class='myclass1'>
element in your example.
"//div/@id"
is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.
//
means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.div
is an element name we want to match. So, in this case, we are telling it to find all div
elements anywhere in the document./
indicates that you want a child node. In this case the id
attribute is a child of the div
element, so first we say we want the div
element, then we need the forward slash to say we want one of the div
element's child nodes.@id
means we want to find all the id
attributes. The @
symbol indicates that it is an attribute name instead of an element name.Yo can get the collection of div by passing xpath syntax
Like this
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
foreach(HtmlNode div doc.DocumentElement.SelectNodes("//div"))
{
///.. code here
}