Html doesn't get updated with Html Agility Pack

c# html html-agility-pack

Question

I'm trying to remove the img and map element from a piece of html.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var oldHtml = doc.DocumentNode.InnerHtml;

if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
    HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
    img.ParentNode.RemoveChild(img);
}

if (doc.DocumentNode.SelectNodes("//map") != null)
{
    HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
    map.ParentNode.RemoveChild(map);
}

var newHtml = doc.DocumentNode.InnerHtml;

The newHtml still contains the img and map element. Do I need to do something else before the html is updated?

Here is the html that I'm trying to strip:

<p><img src="/media/8301/HD00_498x299.jpg"  width="498"  height="299" alt="HD00.JPG" usemap="#imgmap201392714219"/><br />
<br />
 <a title="Download ZIP DWG"
href="/media/8103/detailtekeningen-dwg-unidek-aero.zip"
target="_blank">Klik hier om alle DWG&nbsp;bestanden in
een&nbsp;zipfile te downloaden.</a><br />
 <a title="Download DXF"
href="/media/8104/detailtekeningen-dxf-unidek-aero.zip"
target="_blank">Klik hier om alle DXF bestanden in een zipfile te
downloaden.</a><br />
 <a title="Download PDF"
href="/media/8116/detailtekeningen-pdf-unidek-aero.zip"
target="_blank">Klik hier om alle PDF bestanden in een zipfile te
downloaden.</a><br />
<br />
 <strong><a title="Bouwdetails berekende psi-waarden"
href="/{localLink:8014}" target="_blank">Link naar de technische
bouwdetails met verbeterde eigen ψ-waarden<br />
</a></strong> &nbsp;<map name="imgmap2012104102243"
id="imgmap2012104102243">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="194,419,219,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="221,420,246,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="200,302,226,320" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="209,167,234,185" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="68,46,98,67" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="102,203,129,224" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="273,339,302,360" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="387,350,417,372" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="324,341,354,363" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="223,369,252,390" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="62,270,89,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="93,270,119,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="31,94,60,114" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="79,161,106,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="19,150,50,171" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="82,113,110,134" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="176,231,205,253" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="147,179,176,200" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="139,235,166,257" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="204,56,231,78" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="125,135,153,157" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="265,263,290,284" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="9,202,36,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="39,202,65,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="158,80,184,101" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="188,80,213,102" target="_blank" alt="" />
</map><map id="imgmap201392714219">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="265,463,279,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="282,466,297,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="213,339,237,358" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="206,204,227,220" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="113,105,135,121" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="134,246,154,262" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="299,369,319,386" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="432,409,453,425" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="363,394,385,413" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="254,406,276,422" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="105,298,122,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="122,298,139,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="53,121,77,139" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="49,165,72,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="195,272,214,288" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="152,212,175,230" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="160,276,180,293" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="234,88,255,105" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="132,155,158,174" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="299,294,321,311" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="40,234,55,250" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="56,233,73,251" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="185,108,202,127" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="203,109,219,127" target="_blank" alt="" />
</map></p>

When I debug the img and map element are found, but calling RemoveChild doesn't change the html at all. Also when I try to change an attribute or something else nothing happens.

Popular Answer

I've just discovered that the bug with HTML Agility pack is that you can only ask for .InnerHtml once. After that, it will not update. You are asking for it twice:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var oldHtml = doc.DocumentNode.InnerHtml;

if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
    HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
    img.ParentNode.RemoveChild(img);
}

if (doc.DocumentNode.SelectNodes("//map") != null)
{
    HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
    map.ParentNode.RemoveChild(map);
}

var newHtml = doc.DocumentNode.InnerHtml;

If you get rid of this line:

var oldHtml = doc.DocumentNode.InnerHtml;

It should work. It seems to be a random bug with HtmlAgilityPack.

Sniffdk's solution works because he only gets .OuterHtml once. The HtmlUtilityPack guys need to fix that.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why