HTML Agility Pack Parsing

asp.net-mvc html html-agility-pack

Question

I am very new to HTML Agility Pack. I am trying to find some documentation but having some issues.

I have the following code:

<div class="person">
<a href="blah1.html">Person 1</a>
</div>
<div class="person">
<a href="blah2.html">Person 2</a>
</div>
<div class="person">
<a href="blah3.html">Person 3</a>
</div>
<div class="person">
<a href="blah4.html">Person 4</a>
</div>

Using the parser, how can I only grab links within a div that has a class person?

Thank you!

Accepted Answer

With Html Agility Pack (available on NuGet):

HtmlDocument html = new HtmlDocument();
html.Load(path_to_html); // or html.LoadHtml(html_string)           
var links = html.DocumentNode.SelectNodes("//div[@class='person']/a")
                .Select(n => n.GetAttributeValue("href", null));

Returns:

"blah1.html"
"blah2.html"
"blah3.html"
"blah4.html"

Popular Answer

The following XPath corresponds to your description:

//div[@class='person']/a/@href

It will return the href attributes of the first a elements that reside directly under any div element with the class attribute that is equal to person.

If you are more comfortable with jQuery style selectors, take a look at using CsQuery instead of the HTML Agility Pack.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why