Is there a way to replace html nodes with text nodes using HTMLAgilityPack?

c# dom html html-agility-pack

Question

I would like to use HTMLAgility pack to replace a node within the document with a text node. The purpose of this is to remove tags surrounding the node itself. Currently, I do something like this:

//This code fixes redundant HTML formatting tags
//This is a snippet of code
foreach (var hChildNode in hd.DocumentNode.SelectNodes("//b//b | //i//i | //u//u") ?? Enumerable.Empty<HtmlNode>())
    hChildNode.Name = "remove";
StringBuilder sb = new StringBuilder(hd.DocumentNode.WriteTo());
sb.Replace("<remove>", string.Empty);
sb.Replace("</remove>", string.Empty);

Is there a better way to do this? If I try to create a new text node, and then do something like the code snippet below, I receive an invalid cast error:

foreach (var hChildNode in hd.DocumentNode.SelectNodes("//b//b | //i//i | //u//u") ?? Enumerable.Empty<HtmlNode>())
{
    HtmlNode hNewNode = hd.CreateTextNode(hChildNode.InnerHtml);
    hChildNode.ParentNode.ReplaceChild(hNewNode, hChildNode);
}

(updated after a typo was pointed out, however the problem still remains)

Am I using the method wrong? Is there another method I am supposed to use to perform functions like this? Thanks.

Popular Answer

The purpose of this is to remove tags surrounding the node itself

Your second code snipped performs exactly tag removing except one typo (I guess):

HtmlNode hNewNode = hd.CreateTextNode(hNewNode.InnerHtml);

You should replace hNewNode.InnerHtml by hChildNode.InnerHtml otherwise your code won't even compile (use of unassigned variable).

Also want to mention, after creation of text node it won't have child nodes of the replaced one (instead of this it will have the same value for the InnerHtml property with the node replaced).




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why