HtmlAgilityPack and selecting Nodes and Subnodes

c# html-agility-pack xpath

Question

Hope somebody can help me.

Let´s say I have a html document that contains multiple divs like this example:

<div class="search_hit">

    <span prop="name">Richard Winchester</span>
    <span prop="company">Kodak</span>
    <span prop="street">Arlington Road 1</span>

</div>
<div class="search_hit">

    <span prop="name">Ted Mosby</span>
    <span prop="company">HP</span>
    <span prop="street">Arlington Road 2</span>

</div>

I´m using HtmlAgilityPack to get the html document. What i need to know is how can i get the spans for each "search_hit"-div?

My first thought was something like this:

foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
     foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("//span[@prop]"))
     {

     }
}

Each div should be a object with the included spans as properties. I. e.

public class Record
    {
        public string Name { get; set; }
        public string company { get; set; }
        public string street { get; set; }
    }

And this List shall be filled then:

public List<Record> Results = new List<Record>();

But the XPATH i´m using is not doing a search in the subnode as it should do. It seams that it searches the whole document again and again.

I mean I already got it working in that way that i just get the the spans of the whole page. But then i have no relation between the spans and divs. Means: I don´t know anymore which span is related to which div.

Does somebody know a solution? I already played around that much that i´m totally confused now :)

Any help is appreciated!

Accepted Answer

The following works for me. The important bit is just as BeniBela noted to add a dot in second call to 'SelectNodes'.

List<Record> lstRecords=new List<Record>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
  Record record=new Record();
  foreach (HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
  {
    string attributeValue = node2.GetAttributeValue("prop", "");
    if (attributeValue == "name")
    {
      record.Name = node2.InnerText;
    }
    else if (attributeValue == "company")
    {
      record.company = node2.InnerText;
    }
    else if (attributeValue == "street")
    {
      record.street = node2.InnerText;
    }
  }
  lstRecords.Add(record);
}

Popular Answer

If you use //, it searches from the document begin.

Use .// to search all from the current node

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))

Or drop the prefix entirely to search just for direct children:

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("span[@prop]"))



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why