HTML Agility Pack: How to access HTML attributes?

c# html html-agility-pack parsing

Question

I've following html code:

<tr>
    <td headers="header1"><b><a href="www.site.com">TITLE </a></b></td>
    <td headers="header2"></td>
    <td headers="header3" class="centrato">23/04/2014</td>
</tr> 

I need to store in a datatable:

HREF VALUE in "Link" column;
TITLE in "Title" column;
23/04/2014 in "Date" column;

I tried this:

int i = 0;
foreach (HtmlNode node in tmlDoc.DocumentNode.SelectNodes("//td[@headers='header1']"))
{
  table.Rows.Add();
  table.Rows[i]["Post"] = node.InnerText;
  i++;
 }

This code allow me to add all Title in the datatable but I'm not able to add DATE and HREF, can you help me please?

1
0
7/10/2014 8:56:27 PM

Accepted Answer

You can do this way :

//select all `<tr>` that contains specific `<td>`
foreach (HtmlNode node in tmlDoc.DocumentNode.SelectNodes("//tr[td[@headers='header1']]"))
{
    table.Rows.Add();
    //get <td headers='header1'> in current <tr>
    var header1 = node.SelectSingleNode("./td[@headers='header1']");

    table.Rows[i]["Title"] = header1.InnerText;
    //get <a> in header1 then get it's href attribute value
    table.Rows[i]["Link"] = header1.SelectSingleNode(".//a").GetAttributeValue("href", "");
    //get innerText of <td headers='header1'> in current <tr>
    table.Rows[i]["Post"] = node.SelectSingleNode("./td[@headers='header3']").InnerText;
    i++;
}
2
7/11/2014 12:42:29 AM

Popular Answer

InnerText just gives you the text between the Tag. to access Href or Id or ... you should use GetAttributeValue method.

int i = 0;
foreach (HtmlNode node in tmlDoc.DocumentNode.SelectNodes("//tr"))
{
    table.Rows.Add();
    table.Rows[i]["Link"] = node.SelectSingleNode("//a").GetAttributeValue("href", "");
    table.Rows[i]["Title"] = node.SelectSingleNode("//a").InnerText;
    table.Rows[i]["Date"] = node.SelectSingleNode("//td[@headers='header3']").InnerText;
    i++;
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow