Has anyone done this? Basically, I want to use the html by keeping basic tags such as h1, h2, em, etc; clean all non http addresses in the img and a tags; and HTMLEncode every other tag.
I'm stuck at the HTML Encoding part. I know to remove a node you do a "node.ParentNode.RemoveChild(node);" where node is the object of the class HtmlNode. Instead of removing the node though, I want to HTMLEncode it.
You would need to remove the node representing the element you don't want. The encoded HTML would then need to be re-added as a text node.
If you don't want to process the children of the elements that you want to throw away, you should be able to just use OuterHtml ... something like this might work:
node.AppendChild(new HtmlTextNode { Text = HttpUtility.HtmlEncode(nodeToDelete.OuterHtml) });