Using HTML Agility Pack to get text next to image?

c# c#-4.0 html-agility-pack web-scraping

Question

I have this bit of html that I need to parse though

<p class="feature_list">

<img src="candy.gif" alt="candy" title="candy"/>&nbsp;
                        x 3&nbsp;&nbsp;
<img src="lollies.gif" alt="lollies" title="lollies"/>&nbsp;
                        1&nbsp;&nbsp;
<img src="system.gif" alt="system" title="system"/>&nbsp;

                        x 1&nbsp;&nbsp;
<img src="phone.gif" alt="phone" title="phone"/>&nbsp;
                        x 1&nbsp;&nbsp;
</p>

As you can see there is an image and then a text like "x 3" next to it.

What I want to do is go through each image, and record the text next to it. However, the text is outside the 'img' tag.

I was wondering is there anyway of doing this using the HTML agility pack?

Accepted Answer

The following code:

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(yourHtml);

    foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//img"))
    {
        Console.WriteLine(HtmlEntity.DeEntitize(node.NextSibling.InnerText).Trim());
    }

Will output:

x 3
1
x 1
x 1

Note the HtmlEntity utility that eases the handling of HTML entities (like &nbsp;)



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why