I want to put one table from Wikipedia into xml file and then parse it to C#. Is it possible? If yes, can I save in xml only Title and Genre column?
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://en.wikipedia.org/wiki/2012_in_film");
HtmlNode node = doc.DocumentNode.SelectSingleNode("//table[@class='wikitable']");
You can use a web browser:
//First navigate to your address
webBrowser1.Navigate("http://en.wikipedia.org/wiki/2012_in_film");
List<string> Genre = new List<string>();
List<string> Title = new List<string>();
//When page loaded
foreach (HtmlElement table in webBrowser1.Document.GetElementsByTagName("table"))
{
if (table.GetAttribute("className").Equals("wikitable"))
{
foreach (HtmlElement tr in table.GetElementsByTagName("tr"))
{
int columncount = 1;
foreach (HtmlElement td in tr.GetElementsByTagName("td"))
{
//Title
if (columncount == 4)
{
Title.Add(td.InnerText);
}
//Genre
if (columncount == 7)
{
Genre.Add(td.InnerText);
}
columncount++;
}
}
}
}
now you have two list (genre and title). you can simply convert them to xml file
Also consider looking at the Wikipedia API to zero in on a particular section of a wikipedia page
The API documentation describes how you can format the search results for subsequent parsing.