Set node InnerText in HtmlAgilityPack

c# html-agility-pack html-parsing parsing

Question

I want to substitute another text for the inside text of HTML tags. I'm using HTML Agility Pack.
This code allows me to extract all texts.

HtmlDocument doc = new HtmlDocument();
doc.Load("some path")

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']")) {
    // How to replace node.InnerText with some text ?
}

InnerText is read-only, however. How can I change text and save it to a file at the same time?

1
29
11/25/2011 9:34:51 PM

Accepted Answer

Try the code here. It filtered out script nodes and chose all nodes without children. Perhaps you might add some more filters. Additionally to your XPath expression, this one searches for leaf nodes and removes text from the<script> tags.

var nodes = doc.DocumentNode.SelectNodes("//body//text()[(normalize-space(.) != '') and not(parent::script) and not(*)]");
foreach (HtmlNode htmlNode in nodes)
{
    htmlNode.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(htmlNode.InnerText + "_translated"), htmlNode);
}
20
11/26/2011 7:26:31 AM

Popular Answer

Strangely, InnerHtml turns out to not be read-only. Additionally, when I made that setting,

aElement.InnerHtml = "sometext";

the worth ofInnerText also switched to"sometext"



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow