I have the following table:
<table>
<tr><th>header1</th><th>header2</th><th>header3</th></tr>
<tr><td>value01</td><td>value02</td><td>value03</td></tr>
<tr><td>value11</td><td>value12</td><td>value13</td></tr>
<tr>
<td colspan="3">
<table>
<tr><td>subvalue01</td><td>subvalue02</td></tr>
</table>
</td>
</tr>
</table>
I'm using this code to save the main table cell values into separate ArrayList
and subtable cell values in another ArrayList
. But my ArrayList
for subtable cell values is saving the entire values including table and subtable:
foreach (HtmlNode table in hdoc.DocumentNode.SelectNodes("//table"))
{
///This is the table.
foreach (HtmlNode row in table.SelectNodes("tr").Skip(1))
{
///This is the row.
foreach (HtmlNode cell in row.SelectNodes("th|td"))
///can also use "th|td", but right now we ONLY need td
{
//This is the cell.
if (cell.InnerHtml.Contains("<table>"))
{
foreach (HtmlNode subtable in cell.SelectNodes("//table"))
{
foreach (HtmlNode subrow in subtable.SelectNodes("tr").Skip(1))
{
foreach (HtmlNode subcell in subrow.SelectNodes("th|td"))
{
arrSubList.Add(subcell.InnerText);
}
}
}
}
else
{
arrList.Add(cell.InnerText);
}
}
}
}
What is wrong with my code?
I believe your first line
foreach (HtmlNode table in hdoc.DocumentNode.SelectNodes("//table"))
will select ALL tables - at any level (including the nested tables).
Per: http://www.w3schools.com/XPath/xpath_syntax.asp
// Selects nodes in the document from the current node that match the selection no matter where they are
So, change your first line to
foreach (HtmlNode table in hdoc.DocumentNode.SelectNodes("/html/body/table"))
And see how that goes.