Using HtmlAgilityPack, select just objects in a specified DIV.

c# html-agility-pack

Question

I'm attempting to utilize the HTML Agility Pack to extract all links from a page that are located within a div with the declaration<div class='content'> When I use the code below, however, I just receive EVERY link on the page. Since I am calling SelectNodes from the sub-node I already picked, this doesn't really make sense to me (which when viewed in the debugger only shows the HTML from that specific div). Therefore, each time I run SelectNodes, it seems as if it is returning to the root node. Here is the code I employ:

HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(@"http://example.com");
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='content']");
foreach(HtmlNode link in node.SelectNodes("//a[@href]"))
{
    Console.WriteLine(link.Value);
}

Is this the desired action? If so, how can I make it do what I want it to?

1
12
5/20/2010 3:38:42 PM

Accepted Answer

This will function:

node.SelectNodes("a[@href]")

Additionally, you may accomplish it with only one selector:

doc.DocumentNode.SelectSingleNode("//div[@class='content']//a[@href]")

Note as also thatlink.Value not specified forHtmlNode such that it fails to build your code.

20
5/20/2010 5:43:15 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow