Loop through all descendants of a node and inspect them one by one

c# html-agility-pack linq xpath

Question

I need to make a list of the records in a specific web page. I got the page source in a text file. I need to traverse this node, element by element:

HtmlNodeCollection resultContainer = doc.DocumentNode.SelectNodes("//div[@class='result-list divider-y-5']");

For each element I need to check the type (div, span, etc.), it's "id" and it's "class" attributes to be able to make my list of records. I don't want a collection of all s or s. That will not help because I don't know which type of element I will face while looping through them. I have to check them all. Because all the data I need are children of the node collection mentioned above. Any suggestions?

Accepted Answer

foreach(HtmlNode node in resultContainer)
{
    //check node type
    switch(node.Name)
    {
        case "div":
        {
            break;
        }   
        case "p":
        {
        }
        ///....etc
    }

    //get id
    String id = node.Attributes["id"].Value;

    //get class
    String class = node.Attributes["class"].Value;

}

Popular Answer

I think it's easier to have the HtmlAgilityPack convert the html document to xml, e.g.:

doc.Load(htmlStream, true);
doc.OptionOutputAsXml = true;
doc.OptionFixNestedTags = true;
doc.OptionAutoCloseOnEnd = true;
doc.Save(/* your Xml stream or filename */);

And then use the regular .NET xml api (e.g. using XmlDocument or XDocument) to process the contents.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why