Parsing HTML with LINQ

c# html html-agility-pack linq

Question

Using HTML Agility Pack and LINQ, I'm attempting to get all cells from an HTML table. I've used LINQ to pick the tags from the HTML source that I imported into a HtmlAgilityPack.HtmlDocument. However, it crashes in the second record after using foreach to iterate the result.

A portion of the HTML source is shown here:

<tr>
    <td class='city'>New York</td>
    <td>Card 1</td>
</tr>
<tr>
    <td class='city'>London</td>
    <td>Card 2</td>
</tr>
<tr>
    <td class='city'>Tokyo</td>
    <td>Card 3</td>
</tr>
<tr>
    <td class='city'>Berlin</td>
    <td>Card 4</td>
</tr>

This is what I created:

htmlDoc.LoadHtml(await msgRecived.Content.ReadAsStringAsync());

var tds=
    from td in htmlDoc.DocumentNode.Descendants("td")
    where td.Attributes["class"].Value == "city"
    select td.InnerText;

foreach (var td in tds)
{
    citiesText = citiesText + " " + td;
}

If I use the following syntax instead of the foreach command:

citiesText = tds.ElementAt(0);

It returns A new York but crashes with Reference to an instance of an object not set. when I attempt ElementAt(1).

Any help? Thanks

1
2
8/14/2014 3:12:21 PM

Accepted Answer

You must make certain thatAttributes["class"] is notnull :

var tds =
    from td in doc.DocumentNode.Descendants("td")
    where td.Attributes["class"] != null && td.Attributes["class"].Value == "city"
    select td.InnerText;

The following<td> Recovered has noclass so that whenever you accessAttributes["class"] Consequently, you're receivingnull the call.Value on null is the source of the exception.

As an alternative, you mayGetAttributeValue :

var tds =
    from td in doc.DocumentNode.Descendants("td")
    where td.GetAttributeValue("class", null) == "city"
    select td.InnerText;
3
8/14/2014 3:22:50 PM

Popular Answer

Just a hunch, but I think you're just focusing on the first element's td. Perhaps you need

Instead, use htmlDoc.DocumentNode.Descendants("table").



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow