Parse html document using HtmlAgilityPack

c# html-agility-pack linq

Question

I'm trying to parse the following html snippet via HtmlAgilityPack:

<td bgcolor="silver" width="50%" valign="top">
 <table bgcolor="silver" style="font-size: 90%" border="0" cellpadding="2" cellspacing="0"
                                                width="100%">
   <tr bgcolor="#003366">
       <td>
           <font color="white">Info
        </td>
        <td>
           <font color="white">
              <center>Price
                   </td>
                      <td align="right">
                         <font color="white">Hourly
                         </td>
              </tr>
               <tr>
                 <td>
                     <a href='test1.cgi?type=1'>Bookbags</a>
                 </td>
                   <td>
                      $156.42
                    </td>
                    <td align="right">
                        <font color="green">0.11%</font>
                      </td>
                  </tr>
                  <tr>
                    <td>
                       <a href='test2.cgi?type=2'>Jeans</a>
                     </td>
                         <td>
                            $235.92
                               </td>
                                  <td align="right">
                                     <font color="red">100%</font>
                                  </td>
                   </tr>
               </table>
          </td>

My code looks something like this:

private void ParseHtml(HtmlDocument htmlDoc)
{
    var ItemsAndPrices = new Dictionary<string, int>();
   var findItemPrices = from links in htmlDoc.DocumentNode.Descendants()
                             where links.Name.Equals("table") && 
                             links.Attributes["width"].Equals ("100%") && 
                             links.Attributes["bgcolor"].Equals("silver")
                            select new
                                       {
                                           //select item and price
                                       }

In this instance, I would like to select the item which are Jeans and Bookbags as well as their associated prices below and store them in a dictionary.

E.g Jeans at price $235.92

Does anyone know how to do this properly via htmlagility pack and LINQ?

Popular Answer

Assuming that there could be other rows and you don't specifically want only Bookbags and Jeans, I'd do it like this:

var table = htmlDoc.DocumentNode
    .SelectSingleNode("//table[@bgcolor='silver' and @width='100%']");
var query =
    from row in table.Elements("tr").Skip(1) // skip the header row
    let columns = row.Elements("td").Take(2) // take only the first two columns
        .Select(col => col.InnerText.Trim())
        .ToList()
    select new
    {
        Info = columns[0],
        Price = Decimal.Parse(columns[1], NumberStyles.Currency),
    };



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why