I'm using the HTML agility pack to clean up input to a WYSIWYG. This might not be the best way to do this but I'm working with developers who explode on contact with regex so it will have to suffice.
My WYSIWYG content looks something like this (for example):
<p></p>
<p></p>
<p><span><input id="textbox" type="text" /></span></p>
I need to strip the empty paragraph tags. Here's how I'm doing it at the moment:
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//p");
if (nodes == null)
return;
foreach (HtmlNode node in nodes)
{
node.InnerHtml = node.InnerHtml.Trim();
if (node.InnerHtml == string.Empty)
node.ParentNode.RemoveChild(node);
}
However, because the HTML is not a complete document the paragraph tags do not have a parent node and RemoveChild
will therefore fail since ParentNode
is null.
I can't find another way to remove tag though, can anyone point me at an alternate method?
Technically, first-level elements are children of the document root, so the following code should work:
if (node.InnerHtml == String.Empty) {
HtmlNode parent = node.ParentNode;
if (parent == null) {
parent = doc.DocumentNode;
}
parent.RemoveChild(node);
}
You want to remove from the collection, right?
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//p");
if (nodes == null)
return;
for (int i = 0; i < nodes.Count - 1; i++)
{
nodes[i].InnerHtml = nodes[i].InnerHtml.Trim();
if (nodes[i].InnerHtml == string.Empty)
nodes.Remove(i);
}