I'm parsing an HTML DOM in C# with the HTMLAgilityPack
library and would like to know how to traverse the DOM once I get to a specific element.
For example, when I get to the td
with a class of "some-class", I want to go to the third sibling td
and grab the href
of its nested anchor
.
<td class="some-class">Content I care about</td>
<td>Content I don't want</td>
<td>Content I don't want</td>
<td>
<a href="http://www.the-url-I-want.com">Some Amazing URL</a>
</td>
Currently, I'm landing at the td
I want via:
foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//td"))
{
HtmlAttribute nodeClass = node.Attributes["class"];
if(nodeClass != null && nodeClass.Value == "some-class")
{
//Find the anchor that is 3 siblings away
//Do something
}
}
Does anyone know how I would use HTMLAgility pack to grab the related anchor for the individual td
?
Learn XPath and your job can be a lot easier. For example, to get <td>
element having class attribute equals "some-class", we can use this Xpath :
//td[@class='some-class']
And for getting third next sibling <td>
:
/following-sibling::td[3]
So your loop can be re-written as follow :
var xpath = "//td[@class='some-class']/following-sibling::td[3]/a";
foreach(HtmlNode a in doc.DocumentNode.SelectNodes(xpath))
{
//Do something with the anchor variable a
}
BTW, safer way for getting attribute value is using GetAttributeValue()
method :
var href = a.GetAttributeValue("href", "");
the second argument is default value that will be returned when the attribute not found.