How to get table from Wikipedia

c# html-agility-pack web-scraping xml

Question

I want to parse an xml file containing one Wikipedia table into C#. Can it be done? If so, can I just preserve the Title and Genre column in XML?

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://en.wikipedia.org/wiki/2012_in_film");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//table[@class='wikitable']");
1
1
5/6/2017 9:08:16 PM

Accepted Answer

A web browser may be employed:

//First navigate to your address
 webBrowser1.Navigate("http://en.wikipedia.org/wiki/2012_in_film");
        List<string> Genre = new List<string>();
        List<string> Title = new List<string>();
  //When page loaded
  foreach (HtmlElement table in webBrowser1.Document.GetElementsByTagName("table"))
            {
                if (table.GetAttribute("className").Equals("wikitable"))
                {
                    foreach (HtmlElement tr in table.GetElementsByTagName("tr"))
                    {
                        int columncount = 1;
                        foreach (HtmlElement td in tr.GetElementsByTagName("td"))
                        {
                            //Title
                            if (columncount == 4)
                            {
                                Title.Add(td.InnerText);
                            }
                            //Genre
                            if (columncount == 7)
                            {
                                Genre.Add(td.InnerText);
                            }
                            columncount++;
                        }

                    }
                }
            }

You now have two lists (genre and title). They are easily convertable to xml files.

1
12/26/2012 5:24:15 AM

Popular Answer

To focus on a specific piece of a Wikipedia article, have a look at the Wikipedia API as well.

How to format the search results for later processing is covered in the API documentation.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow