Loop through all descendants of a node and inspect them one by one

c# html-agility-pack linq xpath

Question

I must create a list of all the records on a certain web page. In a text file, I obtained the page source. I have to go through each element of this node:

HtmlNodeCollection resultContainer = doc.DocumentNode.SelectNodes("//div[@class='result-list divider-y-5']");

To be able to create my list of records, I must verify the type (div, span, etc.), "id," and "class" properties for each element. A compilation of all s or s is not what I'm after. That won't help since I have no idea what kind of element I will encounter when I loop through them. I have to examine each one. Because the children of the aforementioned node collection have all the information I need. Any recommendations?

1
0
11/9/2012 10:05:13 AM

Accepted Answer

foreach(HtmlNode node in resultContainer)
{
    //check node type
    switch(node.Name)
    {
        case "div":
        {
            break;
        }   
        case "p":
        {
        }
        ///....etc
    }

    //get id
    String id = node.Attributes["id"].Value;

    //get class
    String class = node.Attributes["class"].Value;

}
1
11/9/2012 10:18:10 AM

Popular Answer

I believe that using the HtmlAgilityPack to convert an HTML page to an XML file is simpler.

doc.Load(htmlStream, true);
doc.OptionOutputAsXml = true;
doc.OptionFixNestedTags = true;
doc.OptionAutoCloseOnEnd = true;
doc.Save(/* your Xml stream or filename */);

then parse the data using the standard.NET xml api (using, for example, XmlDocument or XDocument).



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow