Parse html table using LINQ and HtmlAgilityPack

c# html-agility-pack linq

Question

On web page http://cslh.cz/delegace.html?id_season=2013, I want to analyze the date, link content, and link href from the table's great class.

The thing I made is DelegationLink.

public class DelegationLink
{
   public string date { get; set; }
   public string link { get; set; }
   public string anchor { get; set; }
}

and used it to construct DelegationLink List using LINQ.

var parsedValues =
from table in htmlDoc.DocumentNode.SelectNodes("//table[@class='nice']")
from date in table.SelectNodes("tr//td")
from link in table.SelectNodes("tr//td//a")
   .Where(x => x.Attributes.Contains("href"))
select new DelegationLink
{
   date = date.InnerText,
   link = link.Attributes["href"].Value,
   anchortext = link.InnerText,
};
return parsedValues.ToList();

I just want to take every row in the table and retrieve the date, href, and hreftext from that row. which takes the date column one at a time and combines it with the link column in every row. I spent four hours searching on Google with no luck since I'm new to LINQ. I appreciate the support.

1
1
5/11/2013 1:02:09 PM

Accepted Answer

Well, that's really simple; all you have to do is choose thetr there in theSelectNodes function calls and slightly modify your code. comparable to this

var parsedValues = htmlDoc.DocumentNode.SelectNodes("//table[@class='nice']/tr").Skip(1)
.Select(r =>
      {
        var linkNode = r.SelectSingleNode(".//a");
        return new DelegationLink()
                  {
                    date = r.SelectSingleNode(".//td").InnerText,
                    link = linkNode.GetAttributeValue("href",""),
                    anchor = linkNode.InnerText,
                  };
      }
);
return parsedValues.ToList();
4
5/11/2013 12:56:24 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow