Add newline in HTML source code using HTML Agility Pack

html html-agility-pack vb.net

Question

I am modifying a HTML file using the HTML Agility Pack.

Here is an example on a HTML file containing tables:

Dim document As New HtmlDocument
Dim tables As Array

document.Load(path_html)

Dim div1 As HtmlNode = HtmlNode.CreateNode("<div></div>")
Dim div2 As HtmlNode = HtmlNode.CreateNode("<div></div>")

tables = document.DocumentNode.Descendants("table").ToArray()

For Each tr As HtmlNode In tables.Descendants("tr").ToArray
   tr.AppendChild(div1)
   tr.AppendChild(div2)
Next

document.save(path_html)

And here is the result in the HTML file:

<div></div><div></div>

What I would like is:

<div></div>
<div></div>

I think this should be implemented by default as it makes my HTML file unclear.

I saw this question (which is my exact issue) here but the answer is not working for me (maybe because of VB.NET and the answer is C#).

Can anyone help?

Accepted Answer

Haven't written any vb.net in a long time, so first tried this in C#:

var document = new HtmlDocument();
var div = HtmlNode.CreateNode("<div></div>");
var newline = HtmlNode.CreateNode("\r\n");
div.AppendChild(newline);
for (int i = 0; i < 2; ++i)
{
    div.AppendChild(HtmlNode.CreateNode("<div></div>"));
    div.AppendChild(newline);
}
document.DocumentNode.AppendChild(div);
Console.WriteLine(document.DocumentNode.WriteTo());

Works great - the output:

<div>
<div></div>
<div></div>
</div>

Then thought, "no way....it can't be" - note the commented lines:

Dim document = New HtmlDocument()
Dim div = HtmlNode.CreateNode("<div></div>")
' this writes the literal string...
Dim newline = HtmlNode.CreateNode("\r\n")
' this works!
' Dim newline = HtmlNode.CreateNode(Environment.NewLine)
div.AppendChild(newline)
For i = 1 To 2
    div.AppendChild(HtmlNode.CreateNode("<div></div>"))
    div.AppendChild(newline)
Next
document.DocumentNode.AppendChild(div)
Console.WriteLine(document.DocumentNode.WriteTo())

Unfortunately it is so, and probably why the question you linked to was not marked answered - the output:

<div>\r\n<div></div>\r\n<div></div>\r\n</div>

Finally, instead of using the newline string as \r\n tried Environment.NewLine, which does work and outputs:

<div>
<div></div>
<div></div>
</div>

Works either way in C#.


Popular Answer

Based on this answer you would need to add in a node that represents a Carriage Return (\r) and a Line Feed (\n):

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("\r\n")

Based on your comment:

I tried this but it adds '\r\n' in my HTML, it's not going back to line.

You've already tried this and instead it prints the string literal "\r\n". I too have managed to replicate this issue.

Instead look at using <br> tag which is a line break:

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")

Based on your example code, your code would look something like this:

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")

For Each tr As HtmlNode In tables.Descendants("tr").ToArray
   tr.AppendChild(div1)
   tr.AppendChild(newLineNode)
   tr.AppendChild(div2)
Next

However tables.Descendants("tr").ToArray did provide a compile error for me. As that's out of the scope of this question and you haven't raised it as an issue I'll make an assumption that it works for you.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why