Getting data from HTML table into a datatable

c# html html-agility-pack linq xpath

Question

Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. (C#) The code I am currently using is:

string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();

//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}

The HTML table on the website I am querying looks like this:

<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>

I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! Thanks for any help in advance :)

Accepted Answer

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. There's samples out there that do XML to Datatable from Linq-to-XML. These can be re-worked into what you need.

If needed I can help out creating the whole method, but not today :).

See also:


Popular Answer

Using some of Jack Eker's code above and some code from Mark Gravell (see post here) , I managed to come with a solution. This code snippet is used to obtain the public holidays for the year of 2012 in South Africa as of writing this article

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;



namespace WindowsFormsApplication
{
    public partial class Form1 : Form
    {
        private DataTable dt;
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string htmlCode = "";
            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
            }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(htmlCode);

            dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Value", typeof(string));

            int count = 0;


            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {

                foreach (HtmlNode row in table.SelectNodes("tr"))
                {

                    if (table.Id == "table2")
                    {
                        DataRow dr = dt.NewRow();

                        foreach (var cell in row.SelectNodes("td"))
                        {
                            if ((count % 2 == 0))
                            {
                                dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
                            }
                            else
                            {

                                dr["Value"] = cell.InnerText.Replace("&nbsp;", " ");

                                dt.Rows.Add(dr);
                            }
                            count++;

                        }


                    }

                }


                dataGridView1.DataSource = dt;

            }
        }

    }
}


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why