Removing a HtmlNode inside a HtmlNode with the HtmlAgilityPack

c# html html-agility-pack nodes removechild

Question

How do I remove the number node as well as its value from house?

Document:

<number>123456</number>
<house> <number> </number>Red</house>
<house> <number>12</number>Blue</house>
<number>345345</number>
etc...

Code:

private void getHouse(string houseHtml)
{
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

    htmlDoc.Load(new System.IO.StringReader(houseHtml));

    foreach (HtmlAgilityPack.HtmlNode house in htmlDoc.DocumentNode.SelectNodes("//house"))
    {
        MessageBox.Show(house.InnerText);
    }
}

Result:

 Red
12Blue

Required Result

Red
Blue

I have been trying to use:

house.RemoveChild(house.SelectSingleNode("//number"));

and some other combinations of this and run in to a "Node was not found in collection" or nothing will happen. It will also select the top most number, not the number inside the house tag.

1
1
7/19/2012 5:17:20 PM

Accepted Answer

Why don't you remove the nodes directly?

var numbers = htmlDoc.DocumentNode.SelectNodes("//house/number");
if (numbers != null) {
    foreach(var node in number) 
        node.Remove();
}

Anyways, the reason your XPath is selecting all the nodes is because you need to give a path relative to the current node:

house.SelectSingleNode("//number"); // wrong
house.SelectSingleNode(".//number"); // right
1
7/19/2012 5:26:53 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow