Get particular content within a double div using the HTML Agility Pack.

.net c# html html-agility-pack

Question

Due to my inexperience with HTML Agility Pack, I am unable to comprehend the following piece of code:

<p>
    <div class='myclass1'>
        <div id='idXXXX'>content1<br>content2
        </div>  
        <div class="myclass2">
            <table>
                <tr>
                    <td align="left">content3 <b><a href="">content4</a></b></td>
                    <td align="right">content5 <b><a href="">content6</a></b></td>
                </tr>
            </table>
        </div>
    </div>
</p>

where XXXX is a number that was chosen at random.

I own all the necessary code to load the HTML file.

I want the content1 and content2 from the code above, as well as content4 from a separate query.

1
2
9/18/2011 9:17:49 AM

Accepted Answer

var doc = new HtmlDocument();
doc.Load("test.htm");
var res = doc.DocumentNode.SelectSingleNode("//div[@class='myclass1']");
var firstDiv = res.SelectSingleNode("div");
var content1 = firstDiv.ChildNodes[0].InnerText.Trim();
var content2 = firstDiv.ChildNodes[2].InnerText.Trim();
var content4 = res.SelectSingleNode(".//div[@class='myclass2']")
                  .SelectSingleNode(".//td[@align='left']/b/a")
                  .InnerText
                  .Trim();

UPDATE:

If you want to match the content for each of your many divs that have the specified classes, you might do something like this:

var doc = new HtmlDocument();
doc.Load("test.htm");
var res = doc.DocumentNode.SelectNodes("//div[@class='myclass1']");
foreach (var item in res)
{
    var firstDiv = item.SelectSingleNode("div");
    var content1 = firstDiv.ChildNodes[0].InnerText.Trim();
    var content2 = firstDiv.ChildNodes[2].InnerText.Trim();
    var content4 = item.SelectSingleNode(".//div[@class='myclass2']")
                       .SelectSingleNode(".//td[@align='left']/b/a")
                       .InnerText
                       .Trim();
}
3
9/18/2011 11:04:01 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow