HtmlAgilityPack: Remove tags, Replace with whitespace

c# html-agility-pack

Question

string url = "http://www.myurl.xxx";
HtmlWeb webGet = new HtmlWeb();
HtmlDocument doc =  webGet.Load(url);

foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
   script.Remove();


foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
   style.Remove();

 string mtext =  doc.DocumentNode.InnerText;

The string mtext has no spacing between text where the tags have been removed, how can I 'Remove' AND replace the removed tags with a line break or " " for all tags instances?

Accepted Answer

You're just removing the nodes. Instead of this you should replace those nodes with the new ones. This will replace your <script> and <style> nodes with a space symbol:

foreach (var node in doc.DocumentNode.SelectNodes("//script|//style").ToArray())
{
    var replacement = doc.CreateTextNode(" ");
    node.ParentNode.ReplaceChild(replacement, node);
}



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why