Html Agility Pack loop through table rows and columns

.net c# html-agility-pack

Question

I have a table like this

<table border="0" cellpadding="0" cellspacing="0" id="table2">
    <tr>
        <th>Name
        </th>
        <th>Age
        </th>
    </tr>
        <tr>
        <td>Mario
        </td>
        <th>Age: 78
        </td>
    </tr>
            <tr>
        <td>Jane
        </td>
        <td>Age: 67
        </td>
    </tr>
            <tr>
        <td>James
        </td>
        <th>Age: 92
        </td>
    </tr>
</table>

And want to use HTML Agility Pack to parse it. I have tried this code to no avail:

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
{
    foreach (HtmlNode col in row.SelectNodes("//td"))
    { 
        Response.Write(col.InnerText); 
    }
}

What am I doing wrong?

1
8
2/19/2013 10:55:52 PM

Accepted Answer

I had to provide the full xpath. I got the full xpath by using Firebug from a suggestion by @Coda (https://stackoverflow.com/a/3104048/1238850) and I ended up with this code:

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr/td/table[@id='table2']/tbody/tr"))
{
    HtmlNodeCollection cells = row.SelectNodes("td");
    for (int i = 0; i < cells.Count; ++i)
    {
        if (i == 0)
        { Response.Write("Person Name : " + cells[i].InnerText + "<br>"); }
        else {
            Response.Write("Other attributes are: " + cells[i].InnerText + "<br>"); 
        }
    }
}

I am sure it can be written way better than this but it is working for me now.

1
5/23/2017 12:17:44 PM

Popular Answer

Why don't you just select the tds directly?

foreach (HtmlNode col in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))
    Response.Write(col.InnerText);

Alternately, if you really need the trs separately for some other processing, drop the // and do:

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
    foreach (HtmlNode col in row.SelectNodes("td"))
        Response.Write(col.InnerText);

Of course that will only work if the tds are direct children of the trs but they should be, right?


EDIT:

var cols = doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
    string name = cols[ii].InnerText.Trim();
    int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}

There's probably a more impressive way to do this with LINQ.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow