Remove "img" and "a" tags from node.InnerHtml

c# html html-agility-pack

Question

From my html, I want to extract just the text.

var sb = new StringBuilder();
doc.LoadHtml(inputHTml);

foreach (var node in Doc.DocumentNode.ChildNodes)
{
    if (node.Name == "strong" || node.Name == "#text" 
        || node.Name == "br" || node.Name == "div" 
        || node.Name == "p" || node.Name != "img")
    {
        sb.Append(node.InnerHtml);
    }
}

my node right now. This html is called InnerHtml.

1.

<br><div>text</div><div>, text</div><div>text<br>
<img src="http://example.com/55.jpg" alt="" title="" height="100">
<img src="http://example.com/45.jpg" alt="text" title="text" height="100"></div>

2.

text&nbsp;text&nbsp;text.&nbsp;&nbsp;<a
 href="/content/essie-classics">text</a><br>
  <img> src="" alt="" title="" height="100"><img
 src="http://example.com/img_8862.jpg"
 alt="" title="" height="100"> 

How can I get rid of the img and a tags?

img tag lacks a closing tag.

1
1
3/11/2014 1:22:36 AM

Accepted Answer

I'm not really clear what Point No. 2 implies. But if you want to eliminate everything<img> a component of aHtmlNode you might try it like this:

var imgs = node.SelectNodes("//img");
foreach (var img in imgs)
{
    img.Remove();
}

Remove() performer will eliminateHtmlNode to it from its father. This removes the hair for me just nicely.<img> even without a closing tag, elements.

UPDATE:

To select all, use this XPath query.<img> and <a> components in one query:

node.SelectNodes("//*[self::img or self::a]");

then you may eliminate each of them by iterating over the result set once.

3
3/11/2014 1:33:04 AM

Popular Answer

Use this html node(img) from htmldocument removed example as a guide. Additionally, you could do that:

       var sb = new StringBuilder();
doc.LoadHtml(inputHTml);

        foreach (var node in doc.DocumentNode.ChildNodes)
    {
        if (node.Name != "img" && node.Name!="a")
        {
            sb.Append(node.InnerHtml);
        }
    }


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow