How to delete a node if it has no parent node

.net asp.net c# html-agility-pack

Question

I'm using the HTML agility pack to clean up input to a WYSIWYG. This might not be the best way to do this but I'm working with developers who explode on contact with regex so it will have to suffice.

My WYSIWYG content looks something like this (for example):

<p></p>
<p></p>
<p><span><input id="textbox" type="text" /></span></p>

I need to strip the empty paragraph tags. Here's how I'm doing it at the moment:

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//p");
if (nodes == null)
    return;

foreach (HtmlNode node in nodes)
{
    node.InnerHtml = node.InnerHtml.Trim();
    if (node.InnerHtml == string.Empty)
        node.ParentNode.RemoveChild(node);
}

However, because the HTML is not a complete document the paragraph tags do not have a parent node and RemoveChild will therefore fail since ParentNode is null.

I can't find another way to remove tag though, can anyone point me at an alternate method?

Accepted Answer

Technically, first-level elements are children of the document root, so the following code should work:

if (node.InnerHtml == String.Empty) {
    HtmlNode parent = node.ParentNode;
    if (parent == null) {
        parent = doc.DocumentNode;
    }
    parent.RemoveChild(node);
}

Popular Answer

You want to remove from the collection, right?

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//p");
if (nodes == null)
    return;

for (int i = 0; i < nodes.Count - 1; i++)
{
    nodes[i].InnerHtml = nodes[i].InnerHtml.Trim();
    if (nodes[i].InnerHtml == string.Empty)
        nodes.Remove(i);
}



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why