How to count rows in a table in an html file C#

c# html-agility-pack html-parsing linq

Question

When there is a compound table inside an html file how can one count the rows of the parent table.

What I mean by a compound table; a table in which other tables are contained within some of its cells.

Here is my attempt at coding. Note I receive an incorrect values:

        String htmlFile = "C:/Temp/Test_13.html";
        HtmlDocument doc = new HtmlDocument();
        doc.Load(htmlFile);

        HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
        HtmlNodeCollection rows = tables[1].SelectNodes(".//tr");
        Console.WriteLine(" Rows in second (Parent) table: " + rows.Count());

Please indicate which namespace is used in your answer.

Here is a representative sample file:

<html>
<body>
<table border="1">
<tr>
<td>Apps</td>
</tr>
<tr>
<td>Offcie Web Apps</td>
</tr>
</table>
<br/>
<table border="1">
<tr>
<td>Application</td>
<td>Status</td>
<td>Instances</td>
</tr>
<tr>
<td>PowerPoint</td>
<td>Online</td>
<td>
    <table border="1">
    <tr>
        <td>Server1</td>
        <td>Online</td>
    </tr>
    <tr>
        <td>Server2</td>
        <td>Disabled</td>
    </tr>
    </table>
</td>
</tr>
<tr>
<td>Word</td>
<td>Online</td>
<td>
    <table border="1">
    <tr>
        <td>Server1</td>
        <td>Online</td>
    </tr>
    <tr>
        <td>Server2</td>
        <td>Disabled</td>
    </tr>
    </table>
</td>
</tr>
</table>
</body>
</html>

Thank you.

Accepted Answer

If I understood correctly this is what you want.

int i = 1;
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
foreach (HtmlNode table in tables)
{
    var tmp = table.ParentNode;
    if (tmp.OriginalName.Contains("td"))
        MessageBox.Show("The parent of table #" + i + " has" + tmp.ParentNode.ParentNode.Elements("tr").Count().ToString() + " rows.");
    i++;
}

The MessageBox will pop up 2 times:

"The parent of table #3 has 3 rows."
"The parent of table #4 has 3 rows."

EDIT (ANSWERING QUESTIONS):

1) I started counter from int i = 1. The var i = 1 will be the same thing, it just automatically replace var with int.

2) I edited code now you will have same result with me

3) I started counting from 1 so you have table #1, table #2, table #3 and table #4. Your 2 last tables (table #3 and #4) are sub-tables of table #2, table #2 have 3 rows. My above code print only tables that are sub-tables of some table. Can you show me what you want as answer?

EDIT 2:

int i = 1;
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
foreach (HtmlNode table in tables)
{
    if (!table.ParentNode.OriginalName.Contains("td")) // If table is not sub-table
        MessageBox.Show("Table #" + i + " have " + table.Elements("tr").Count().ToString() + " rows.");
    i++;
}

The MessageBox will pop up 2 times:

"The parent of table #1 has 2 rows."
"The parent of table #2 has 3 rows."

Popular Answer

I would recommend you try the csQuery nuget package. It's designed to take most of the headaches out of doing things exactly like that. You can use the css selector query syntax, which most web devs are quite familiar with. In this case, you could probably get away with body > table:nth-of-type(2) > tr and it will return an array of all the tr's, then just count them, or check the length of the resulting array. Alternatively, body > table ~ table > tr would work as well from the sample you gave as would br + table > tr




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why