I'm trying to take all cells from a HTML table using Html Agility Pack and LINQ. I have loaded the HTML source in a HtmlAgilityPack.HtmlDocument and selected the tags with LINQ. However after using foreach to iterate the result, it crashes in the second record.
This is a fragment of the HTML source:
<tr>
<td class='city'>New York</td>
<td>Card 1</td>
</tr>
<tr>
<td class='city'>London</td>
<td>Card 2</td>
</tr>
<tr>
<td class='city'>Tokyo</td>
<td>Card 3</td>
</tr>
<tr>
<td class='city'>Berlin</td>
<td>Card 4</td>
</tr>
And this is what I made:
htmlDoc.LoadHtml(await msgRecived.Content.ReadAsStringAsync());
var tds=
from td in htmlDoc.DocumentNode.Descendants("td")
where td.Attributes["class"].Value == "city"
select td.InnerText;
foreach (var td in tds)
{
citiesText = citiesText + " " + td;
}
It only return the first element, e.g. if instead of using foreach I do:
citiesText = tds.ElementAt(0);
It returns New York, but if I try ElementAt(1) it crashes with Object reference not set to an instance of an object.
Any help? Thanks
You need to make sure that Attributes["class"]
is not null
:
var tds =
from td in doc.DocumentNode.Descendants("td")
where td.Attributes["class"] != null && td.Attributes["class"].Value == "city"
select td.InnerText;
The second <td>
retrieved has no class
attribute, so when you access Attributes["class"]
in that case, you're getting null
. Calling .Value
on null
is causing the exception.
Alternatively you could use GetAttributeValue
:
var tds =
from td in doc.DocumentNode.Descendants("td")
where td.GetAttributeValue("class", null) == "city"
select td.InnerText;
Just a guess but you are probably only looking at the td on the first element. Maybe you need
htmlDoc.DocumentNode.Descendants("table") instead.