How can I remove the commented text in html using htmlagilitypack

html-agility-pack

Question

Is it possible to remove the commented text in html using htmlagilitypack library? Currently I'm doing some migrating work from ASP to ASP.NET MVC and there it's used Regex for those things and just want to know can I achieve that using htmlagilitypack before starting to try it.

Accepted Answer

You could find all the nodes of type HtmlCommentNode (which represents an HTML comment) and remove it from the document. But note, AgilityPack treats e.g. <!DOCTYPE html> as a comment node too. So nodes like this should be skipped for deletion:

var doc = new HtmlDocument();
doc.LoadHtml(html);
var comments = doc.DocumentNode.DescendantNodes()
    .OfType<HtmlCommentNode>()
    .Where(c=> 
        !c.Comment.StartsWith("<!DOCTYPE", StringComparison.OrdinalIgnoreCase)
    ).ToList();

foreach (var comment in comments)
    comment.Remove();

var result = doc.DocumentNode.InnerHtml;



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why