HTMLAgilityPack c# element removal by class name

.net c# html-agility-pack xpath xslt

Question

To read the contents of my HTML page into a string and other things, I'm using the HTML Agility Pack. After this is completed, I want to delete certain items from that content based on their class, however I'm running into a difficulty.

My HTML is as follows:

<div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs">
            </div>
        </div>

        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>

Content goes here...
</div>

Now, in order to access all of the material contained in the, I used an xpath selection and the InnerHtml attribute as follows:

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

I want to delete the div with the class "breadCrumbContainer" from this point forward, however when I use the code below, I get the error "Node "" was not found in the collection."

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            node = node.RemoveChild(node.SelectSingleNode("//div[@class='breadCrumbContainer']"));

            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

Please, if someone can offer some light on this. I'm quite new to the HtmlAgility library and to Xpath.

Thanks,

Dave

1
6
3/8/2011 4:36:36 AM

Accepted Answer

Because a grandchild cannot be removed using RemoveChild, only a direct child can. Instead, try this:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
    node.ParentNode.RemoveChild(node);
12
3/7/2011 3:03:14 PM

Popular Answer

This is a really straightforward assignment using XSLT:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "div[@class='breadCrumbContainer'
     and
       ancestor::div[@id='wrapper']
      ]
  "/>
</xsl:stylesheet>

when the specified XML document is subjected to this transformation (plus an additional<div> and encased in an<html> Top component to add difficulty and realism):

<html>
 <div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs"></div>
        </div>
        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>  Content goes here...
    </div>
 </div>
 <div>
   Something else here
 </div>
</html>

The desired, accurate outcome is obtained:

<html>
  <div id="wrapper">
    <div class="maincolumn">
      <div class="seo_list">
        <div class="seo_head">Header</div>
      </div>  Content goes here...
    </div>
  </div>
  <div>
   Something else here
 </div>
</html>


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow