With the HTML Agility Pack, you can get links in class.

c# html-agility-pack

Question

There are a bunch of tr's with the class alt. I want to get all the links (or the first of last) yet i cant figure out how with html agility pack.

I tried variants of a but i only get all the links or none. It doesnt seem to only get the one in the node which makes no sense since i am writing n.SelectNodes

html.LoadHtml(page);
var nS = html.DocumentNode.SelectNodes("//tr[@class='alt']");
foreach (var n in nS)
{
  var aS = n.SelectNodes("a");
  ...
}
1
13
5/18/2010 1:57:20 PM

Accepted Answer

You can use LINQ:

var links = html.DocumentNode
           .Descendants("tr")
           .Where(tr => tr.GetAttributeValue("class", "").Contains("alt"))
           .SelectMany(tr => tr.Descendants("a"))
           .ToArray();

Note that this will also match <tr class="Malto">; you may want to replace the Contains call with a regex.

You could also use Fizzler:

html.DocumentNode.QuerySelectorAll("tr.alt a");

Note that both methods will also return anchors that aren't links.

15
5/18/2010 1:59:01 PM

Popular Answer

Why not select all links in single query:

html.LoadHtml(page);
var nS = html.DocumentNode.SelectNodes("//tr[@class='alt']//a");
foreach(HtmlNode linkNode in nS)
{
//do something
}

It's valid for html:

<table>
<tr class = "alt">
<td><'a href="link.html">Some Link</a></td>
</tr>
</table>


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow