HtmlAgilityPack - removing all nodes in a collection

c# html html-agility-pack windows-runtime windows-store-apps

Question

I'm trying to fix this weird nested HTML I get from using contentEditable

<span lang="">
   <p>line one</p>
   <p>line two</p>
</span>

I want to replace each of these span nodes with its children

<p>line one</p>
<p>line two</p>

Here's what I tried.

var spans = doc.DocumentNode.Descendants().Where(x => x.Name == "span" && x.Attributes["lang"] != null).ToList();
foreach (var span in spans)
{
    foreach (var child in span.ChildNodes)
    {
        var ch = doc.CreateElement(child.Name);
        ch.InnerHtml = child.InnerHtml;
        doc.DocumentNode.InsertBefore(ch, span);
    }            
    span.Remove();
}

This throws a System.ArgumentOutOfRangeException with the following message.

Node "<span lang=""></span>" was not found in the collection

I understand why this is happening. Editing the document voids my collection of span elements. So how do go about doing this?

Also, how do I cope with text which is not contained in a childnode? Suppose I found this element

<span lang="">
   <p>line one</p>
   <p>line two</p>
   line three
</span>

How do I de-nest that?

PLEASE NOTE: This is HtmlAgilityPack for WinRT, so SelectSingleNode and all xpath commands are not available to me

Accepted Answer

As for your issue the fix should be to invoke InsertBefore from the parent node, not the document root.

Moreover I think you can directly "move" nodes without creating new ones:

foreach (var span in spans)
{
    foreach (var child in span.ChildNodes)
    {
        span.ParentNode.InsertBefore(child, span);
    }
    span.Remove();
}


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow