I have some plain text which contains line breaks like this:
Dear Person,\r\nHello and welcome to this example.\r\nTodo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.
I would like to use HtmlAgility pack (if needed) to clean the Html and replace the new line breaks, with BR except where they are in an HTML tag already (see the LI in the UL tag)
I can easily replace the BR using regx or text.Replace(Environment.NewLine, "<br/>")
but how do I exclude the scenario where it is in a tag?
Thanks.
It seems you need to process top-level HTML text nodes only (text nodes don't have child nodes):
var html = "Dear Person,\r\nHello and welcome to this example.\r\nTodo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.ChildNodes
.OfType<HtmlTextNode>()
.ToList();
foreach (var node in textNodes)
node.Text = node.Text.Replace(Environment.NewLine, "<br />");
This will produce something like this:
Dear Person,<br />Hello and welcome to this example.<br />Todo: <ul><li>item 1</li>\r\n<li>item 2</li>\r\nThanks.</ul>