Unable to set InnerText using Html-Agility-Pack

html-agility-pack

Question

Given an HTML document, I want to identify all the numbers in the document and add custom tags around the numbers. Right now, i use the following:

HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//body");
MatchCollection numbersColl = Regex.Matches(htmlNode.InnerText, <some regex>);

Once I get the numbersColl, I can traverse through each Match and get the index. However, I can't change the InnerText since it is read-only. What I need is that if match.Value = 100 and match.Index=25, I want to replace that 25 with <span isIdentified='true'> 25 </span>

Any help on this will be greatly appreciated. Currently, since I am not able to modify the inner text, I have to modify the InnerHtml but some element might have 25 in it's innerHtml. That should not be touched. But how do I identify whether the number is within an html tag i.e. < table border='1' > has 1 in the tag.

Accepted Answer

Here's what I did to work around the read-only property limitation of the InnerText property of a Text node, just select the Parent node of the Text node and note the index of the Text node in the child node collections of the Parent node. Then just do a ReplaceChild(...).

       private void WriteText(HtmlNode node, string text)
        {
            if (node.ChildNodes.Count > 0)
            {
                node.ReplaceChild(htmlDocument.CreateTextNode(text), node.ChildNodes.First());
            }
            else
            {
                node.AppendChild(htmlDocument.CreateTextNode(text));
            }
        }

In your case I believe you need to create a new Element node that wraps the text into an HtmlElement and then just use it as a replacement of the Text node.

Or even better, see if you can do something like the answer posted here: Replacing a HTML div InnerText tag using HTML Agility Pack


Popular Answer

creating a textnode does not what it should do in this case:

myParentNode.AppendChild(D.CreateTextNode("<script>alert('a');</script>"));
Console.Write(myParentNode.InnerHtml);

The result should be something like &lt;script....

but it is a working script task even if i add it as "TEXT" not as html. This causes kind of a security issue for me because the text would be a input from a anonymous user.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why