Not able to parse html Table in C# using HtmlAgilityPack

c# html-agility-pack

Question

I want to read the table shown in this link.

When I tried to do with HtmlAgilityPack, I am getting null

var nodes = document.DocumentNode.SelectNodes("//table[contains(@class, 'table')]");

Can you please let me know what is the issue ? Am I doing it in wrong way?

Popular Answer

There is nothing wrong with your xpath. I am just gonna assume that you don't know how to get the data out of the table. You need to look up xpaths.

    public static void Main(string[] args)
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.manualslib.com/brand/A.html");
            request.Method = "GET";
            request.ContentType = "text/html;charset=utf-8";

            using (var response = (HttpWebResponse)request.GetResponse())
            {
                using (var stream = response.GetResponseStream())
                {
                    doc.Load(stream, Encoding.GetEncoding("utf-8"));
                }
            }
        }
        catch (WebException ex)
        {
            Console.WriteLine(ex.Message);
        }
        //Works fine
        HtmlNode tablebody = doc.DocumentNode.SelectSingleNode("//table[contains(@class, 'table')]/tbody");
        foreach(HtmlNode tr in tablebody.SelectNodes("./tr"))
        {
            Console.WriteLine("\nTableRow: ");
            foreach(HtmlNode td in tr.SelectNodes("./td"))
            {
                if (td.GetAttributeValue("class", "null") == "col1")
                {
                    Console.Write("\t " + td.InnerText);
                }
                else
                {
                    HtmlNode temp = td.SelectSingleNode(".//div[@class='catel']/a");
                    if (temp != null)
                    {
                        Console.Write("\t " + temp.GetAttributeValue("href", "no url"));
                    }
                }


            }
        }
        Console.ReadKey();
    }

First we go into the node, tbody with the xpath, but only if the attribute in the class in the table contains 'table':

//table[contains(@class, 'table')]/tbody

Now we select all the nodes called tr(table row):

./tr

The dot here means that from the current context we're in we go going to find all the tr-nodes. Then in each tr-node we are going to find all the td-nodes with:

./td

Now in each table cell we want to get the data. In the first td we know the class-attribute is equal to 'col1'. So if the td contains a class with that value - we then want to get the text inside that td-node.

If however it doesn't contain that attribute we know that we want the anchor-tag that is inside a div that has a class-attribute with the value 'catel'.

Inside that anchor-tag we want to get the value of the href-attribute.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why