HTMLagilitypack is not removing all html tags How can I solve this efficiently?

c# html-agility-pack string

Question

I'm using the technique below to remove all HTML from the string:

public static string StripHtmlTags(string html)
        {
            if (String.IsNullOrEmpty(html)) return "";
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
            return doc.DocumentNode.InnerText;
        }

But it seems to be disregarding this tag:[…]

Thus, the string basically returns:

> A hungry thief who stole a rack of pork ribs from a grocery store has
> been sentenced to spend 50 years in prison. Willie Smith Ward felt the
> full force of the law after being convicted of the crime in Waco,
> Texas, on Wednesday. The 43-year-old may feel slightly aggrieved over
> the severity of the […]

How can I make sure that tags of this kind are removed?

Any assistance is much welcomed; thank you.

1
13
6/1/2013 5:53:35 PM

Accepted Answer

Try HttpUtility.HtmlDecode

public static string StripHtmlTags(string html)
{
    if (String.IsNullOrEmpty(html)) return "";
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    return HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
}

HtmlDecode will translate.[…] to […]

42
6/1/2013 5:56:34 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow