How to use Agility Pack to remove comments from HTML without loosing the DOCTYPE?

html-agility-pack

Question

I'm attempting to purge HTML of superfluous material. I want to specifically delete comments. I came up with a reasonably reasonable solution (Utilizing HTML Agility Pack to take note of meta-tags and comments), however the DOCTYPE gets eliminated along with the comments since it is seen as a remark. How can I modify the code below better such that the DOCTYPE is maintained?

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlContent);
var nodes = htmlDoc.DocumentNode.SelectNodes("//comment()");
if (nodes != null)
{
    foreach (HtmlNode comment in nodes)
    {
        comment.ParentNode.RemoveChild(comment);
    }
}
1
10
5/23/2017 12:00:31 PM

Accepted Answer

Verify that your remark doesn't begin withDOCTYPE

  foreach (var comment in nodes)
  {
     if (!comment.InnerText.StartsWith("DOCTYPE"))
         comment.ParentNode.RemoveChild(comment);
  }
9
7/4/2011 5:20:12 AM

Popular Answer

doc.DocumentNode.Descendants()
 .Where(n => n.NodeType == HtmlAgilityPack.HtmlNodeType.Comment)
 .ToList()
 .ForEach(n => n.Remove());

By doing this, all comments on the paper will be removed.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow