HTML Agility Pack Parsing

asp.net-mvc html html-agility-pack

Question

I've just started using HTML Agility Pack. I'm having trouble looking for some documents.

I own the next code:

<div class="person">
<a href="blah1.html">Person 1</a>
</div>
<div class="person">
<a href="blah2.html">Person 2</a>
</div>
<div class="person">
<a href="blah3.html">Person 3</a>
</div>
<div class="person">
<a href="blah4.html">Person 4</a>
</div>

How can I use the parser to only retrieve links inside of divs with the class person?

I'm grateful.

1
2
1/25/2013 8:58:06 PM

Accepted Answer

Zzz-5-Zzz (available on NuGet) allows for:

HtmlDocument html = new HtmlDocument();
html.Load(path_to_html); // or html.LoadHtml(html_string)           
var links = html.DocumentNode.SelectNodes("//div[@class='person']/a")
                .Select(n => n.GetAttributeValue("href", null));

Returns:

"blah1.html"
"blah2.html"
"blah3.html"
"blah4.html"
3
1/25/2013 9:03:25 PM

Popular Answer

According on your description, the XPath below is appropriate:

//div[@class='person']/a/@href

It will bring back thehref characteristics of the firsta components that are immediately underneath anydiv component of theclass a quality that is equivalent toperson .

Instead of utilizing the HTML Agility Pack, you may want to try using CsQuery if you are more familiar with jQuery style selectors.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow