I am just getting involved on parsing some html files using C# language and HtmlAgilityPack.
I am trying to get for each row the two columns values to insert them into a database. But running the following:
foreach (HtmlNode row in htmlDoc.DocumentNode.SelectNodes("//tr"))
{
foreach (HtmlNode cell in row.SelectNodes("//td"))
{
Console.WriteLine(cell.InnerText);
}
}
I got an error as I loop over all the td and not only the ones includes in the current tr.
My html looks like this:
<table>
<tr>
<th align="center" width="50"><b>column 1</b></th>
<th align="center" width="210"><b>column 2</b></th>
</tr>
<tr bgcolor="#ffffff">
<td align="left"> </td>
<td align="left"></td>
</tr>
<tr bgcolor="#dddddd">
<td align="left"> </td>
<td align="left"></td>
</tr>
<tr bgcolor="#ffffff">
<td align="left"> </td>
<td align="left"></td>
</tr>
maybe this
var rows = doc.DocumentNode
.SelectNodes("//tr")
.Select((z, i) => new
{
RowNumber = i,
Cells = z.ChildNodes.Where(c => c.NodeType == HtmlNodeType.Element) })
.ToList();
rows.ForEach(row => Console.WriteLine("{0}: {1}", row.RowNumber, string.Join(", ", row.Cells.Select(z => z.InnerText))));