Traverse DOM with HTML Agility Pack

.net asp.net c# html html-agility-pack

Question

I'm parsing an HTML DOM in C# with the HTMLAgilityPack library and would like to know how to traverse the DOM once I get to a specific element.

For example, when I get to the td with a class of "some-class", I want to go to the third sibling td and grab the href of its nested anchor.

<td class="some-class">Content I care about</td>
<td>Content I don't want</td>
<td>Content I don't want</td>
<td>    
    <a href="http://www.the-url-I-want.com">Some Amazing URL</a>
</td>

Currently, I'm landing at the td I want via:

foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//td"))
{
    HtmlAttribute nodeClass = node.Attributes["class"];

    if(nodeClass != null && nodeClass.Value == "some-class")
    {
        //Find the anchor that is 3 siblings away
        //Do something
    }
}

Does anyone know how I would use HTMLAgility pack to grab the related anchor for the individual td?

Popular Answer

Learn XPath and your job can be a lot easier. For example, to get <td> element having class attribute equals "some-class", we can use this Xpath :

//td[@class='some-class']

And for getting third next sibling <td> :

/following-sibling::td[3]

So your loop can be re-written as follow :

var xpath = "//td[@class='some-class']/following-sibling::td[3]/a";
foreach(HtmlNode a in doc.DocumentNode.SelectNodes(xpath))
{
    //Do something with the anchor variable a
}

BTW, safer way for getting attribute value is using GetAttributeValue() method :

var href = a.GetAttributeValue("href", "");

the second argument is default value that will be returned when the attribute not found.



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why