I need to make a list of the records in a specific web page. I got the page source in a text file. I need to traverse this node, element by element:
HtmlNodeCollection resultContainer = doc.DocumentNode.SelectNodes("//div[@class='result-list divider-y-5']");
For each element I need to check the type (div, span, etc.), it's "id" and it's "class" attributes to be able to make my list of records. I don't want a collection of all s or s. That will not help because I don't know which type of element I will face while looping through them. I have to check them all. Because all the data I need are children of the node collection mentioned above. Any suggestions?
foreach(HtmlNode node in resultContainer)
{
//check node type
switch(node.Name)
{
case "div":
{
break;
}
case "p":
{
}
///....etc
}
//get id
String id = node.Attributes["id"].Value;
//get class
String class = node.Attributes["class"].Value;
}
I think it's easier to have the HtmlAgilityPack convert the html document to xml, e.g.:
doc.Load(htmlStream, true);
doc.OptionOutputAsXml = true;
doc.OptionFixNestedTags = true;
doc.OptionAutoCloseOnEnd = true;
doc.Save(/* your Xml stream or filename */);
And then use the regular .NET xml api (e.g. using XmlDocument or XDocument) to process the contents.