Removing a HtmlNode inside a HtmlNode with the HtmlAgilityPack

c# html html-agility-pack nodes removechild

Question

How do I remove the number node as well as its value from house?

Document:

<number>123456</number>
<house> <number> </number>Red</house>
<house> <number>12</number>Blue</house>
<number>345345</number>
etc...

Code:

private void getHouse(string houseHtml)
{
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

    htmlDoc.Load(new System.IO.StringReader(houseHtml));

    foreach (HtmlAgilityPack.HtmlNode house in htmlDoc.DocumentNode.SelectNodes("//house"))
    {
        MessageBox.Show(house.InnerText);
    }
}

Result:

 Red
12Blue

Required Result

Red
Blue

I have been trying to use:

house.RemoveChild(house.SelectSingleNode("//number"));

and some other combinations of this and run in to a "Node was not found in collection" or nothing will happen. It will also select the top most number, not the number inside the house tag.

Accepted Answer

Why don't you remove the nodes directly?

var numbers = htmlDoc.DocumentNode.SelectNodes("//house/number");
if (numbers != null) {
    foreach(var node in number) 
        node.Remove();
}

Anyways, the reason your XPath is selecting all the nodes is because you need to give a path relative to the current node:

house.SelectSingleNode("//number"); // wrong
house.SelectSingleNode(".//number"); // right



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why