I am trying to parse the following HTML. I need to get the innertext of all links under a h4 tag with the value "Title".
<h4>Title</h4>
<ul>
<li>
<a>One</a>
</li>
<li>
<a>Two</a>
</li>
<li>
<a>Three</a>
</li>
</ul>
I can get the h4 element ok using the following code:
var links = document.DocumentNode.SelectNodes("//h4[contains(text(),'Title')]");
The problem comes with trying to get the a nodes. I have tried the following code but none works:
var urls = member.SelectNodes(".//a");
foreach (var url in urls)
{
Console.WriteLine(url.InnerText);
}
From what I can gather, I think its not working because the xpath you're using is expecting the a nodes to be children of your h4 node, I've not tested this, and may be missing interpreting your requirements but...
var links = document.DocumentNode.SelectNodes("//h4[contains(text(),'Title')]/following-sibling::*[1]//a");
This would get all of the a nodes that are found in the first sibling of the h4 node. So in your example HTML, it should get all a nodes within the ul node
Hope this helps