How do I replace line breaks with valid html but not when in an html element already

c# html-agility-pack

Question

I have some plain text which contains line breaks like this:

Dear Person,\r\nHello and welcome to this example.\r\nTodo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.

I would like to use HtmlAgility pack (if needed) to clean the Html and replace the new line breaks, with BR except where they are in an HTML tag already (see the LI in the UL tag)

I can easily replace the BR using regx or text.Replace(Environment.NewLine, "<br/>") but how do I exclude the scenario where it is in a tag?

Thanks.

Popular Answer

It seems you need to process top-level HTML text nodes only (text nodes don't have child nodes):

var html = "Dear Person,\r\nHello and welcome to this example.\r\nTodo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.ChildNodes
    .OfType<HtmlTextNode>()
    .ToList();

foreach (var node in textNodes)
    node.Text = node.Text.Replace(Environment.NewLine, "<br />");

This will produce something like this:

Dear Person,<br />Hello and welcome to this example.<br />Todo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.</ul>



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why