I'm trying to parse the following html snippet via HtmlAgilityPack:
<td bgcolor="silver" width="50%" valign="top">
<table bgcolor="silver" style="font-size: 90%" border="0" cellpadding="2" cellspacing="0"
width="100%">
<tr bgcolor="#003366">
<td>
<font color="white">Info
</td>
<td>
<font color="white">
<center>Price
</td>
<td align="right">
<font color="white">Hourly
</td>
</tr>
<tr>
<td>
<a href='test1.cgi?type=1'>Bookbags</a>
</td>
<td>
$156.42
</td>
<td align="right">
<font color="green">0.11%</font>
</td>
</tr>
<tr>
<td>
<a href='test2.cgi?type=2'>Jeans</a>
</td>
<td>
$235.92
</td>
<td align="right">
<font color="red">100%</font>
</td>
</tr>
</table>
</td>
My code looks something like this:
private void ParseHtml(HtmlDocument htmlDoc)
{
var ItemsAndPrices = new Dictionary<string, int>();
var findItemPrices = from links in htmlDoc.DocumentNode.Descendants()
where links.Name.Equals("table") &&
links.Attributes["width"].Equals ("100%") &&
links.Attributes["bgcolor"].Equals("silver")
select new
{
//select item and price
}
In this instance, I would like to select the item which are Jeans and Bookbags
as well as their associated prices
below and store them in a dictionary.
E.g Jeans at price $235.92
Does anyone know how to do this properly via htmlagility pack and LINQ?
Assuming that there could be other rows and you don't specifically want only Bookbags and Jeans, I'd do it like this:
var table = htmlDoc.DocumentNode
.SelectSingleNode("//table[@bgcolor='silver' and @width='100%']");
var query =
from row in table.Elements("tr").Skip(1) // skip the header row
let columns = row.Elements("td").Take(2) // take only the first two columns
.Select(col => col.InnerText.Trim())
.ToList()
select new
{
Info = columns[0],
Price = Decimal.Parse(columns[1], NumberStyles.Currency),
};