Remove "img" and "a" tags from node.InnerHtml

c# html html-agility-pack

Question

I want to extract only text from my html

var sb = new StringBuilder();
doc.LoadHtml(inputHTml);

foreach (var node in Doc.DocumentNode.ChildNodes)
{
    if (node.Name == "strong" || node.Name == "#text" 
        || node.Name == "br" || node.Name == "div" 
        || node.Name == "p" || node.Name != "img")
    {
        sb.Append(node.InnerHtml);
    }
}

now in my node.InnerHtml is this html:

1.

<br><div>text</div><div>, text</div><div>text<br>
<img src="http://example.com/55.jpg" alt="" title="" height="100">
<img src="http://example.com/45.jpg" alt="text" title="text" height="100"></div>

2.

text&nbsp;text&nbsp;text.&nbsp;&nbsp;<a
 href="/content/essie-classics">text</a><br>
  <img> src="" alt="" title="" height="100"><img
 src="http://example.com/img_8862.jpg"
 alt="" title="" height="100"> 

how to remove img and a tags?

img tag not have the close tag

Accepted Answer

Not sure I understand what point no.2 means. But if you want to remove all <img> element from a HtmlNode, you can try this way :

var imgs = node.SelectNodes("//img");
foreach (var img in imgs)
{
    img.Remove();
}

Remove() function will remove HtmlNode from it's parent. This works fine for me to remove <img> elements, even without closing tag.

UPDATE :

You can use this XPath expression to select all <img> and <a> elements in single query :

node.SelectNodes("//*[self::img or self::a]");

then you can iterate through result set once to remove each of them.


Popular Answer

Refer to this remove html node(img) from htmldocument sample. you can also do like that:

       var sb = new StringBuilder();
doc.LoadHtml(inputHTml);

        foreach (var node in doc.DocumentNode.ChildNodes)
    {
        if (node.Name != "img" && node.Name!="a")
        {
            sb.Append(node.InnerHtml);
        }
    }



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why