I am modifying a HTML file using the HTML Agility Pack.
Here is an example on a HTML file containing tables:
Dim document As New HtmlDocument
Dim tables As Array
document.Load(path_html)
Dim div1 As HtmlNode = HtmlNode.CreateNode("<div></div>")
Dim div2 As HtmlNode = HtmlNode.CreateNode("<div></div>")
tables = document.DocumentNode.Descendants("table").ToArray()
For Each tr As HtmlNode In tables.Descendants("tr").ToArray
tr.AppendChild(div1)
tr.AppendChild(div2)
Next
document.save(path_html)
And here is the result in the HTML file:
<div></div><div></div>
What I would like is:
<div></div>
<div></div>
I think this should be implemented by default as it makes my HTML file unclear.
I saw this question (which is my exact issue) here but the answer is not working for me (maybe because of VB.NET and the answer is C#).
Can anyone help?
Haven't written any vb.net
in a long time, so first tried this in C#
:
var document = new HtmlDocument();
var div = HtmlNode.CreateNode("<div></div>");
var newline = HtmlNode.CreateNode("\r\n");
div.AppendChild(newline);
for (int i = 0; i < 2; ++i)
{
div.AppendChild(HtmlNode.CreateNode("<div></div>"));
div.AppendChild(newline);
}
document.DocumentNode.AppendChild(div);
Console.WriteLine(document.DocumentNode.WriteTo());
Works great - the output:
<div>
<div></div>
<div></div>
</div>
Then thought, "no way....it can't be" - note the commented lines:
Dim document = New HtmlDocument()
Dim div = HtmlNode.CreateNode("<div></div>")
' this writes the literal string...
Dim newline = HtmlNode.CreateNode("\r\n")
' this works!
' Dim newline = HtmlNode.CreateNode(Environment.NewLine)
div.AppendChild(newline)
For i = 1 To 2
div.AppendChild(HtmlNode.CreateNode("<div></div>"))
div.AppendChild(newline)
Next
document.DocumentNode.AppendChild(div)
Console.WriteLine(document.DocumentNode.WriteTo())
Unfortunately it is so, and probably why the question you linked to was not marked answered - the output:
<div>\r\n<div></div>\r\n<div></div>\r\n</div>
Finally, instead of using the newline string as \r\n
tried Environment.NewLine
, which does work and outputs:
<div>
<div></div>
<div></div>
</div>
Works either way in C#.
Based on this answer you would need to add in a node that represents a Carriage Return (\r
) and a Line Feed (\n
):
Dim newLineNode As HtmlNode = HtmlNode.CreateNode("\r\n")
Based on your comment:
I tried this but it adds '\r\n' in my HTML, it's not going back to line.
You've already tried this and instead it prints the string literal "\r\n". I too have managed to replicate this issue.
Instead look at using <br>
tag which is a line break:
Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")
Based on your example code, your code would look something like this:
Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")
For Each tr As HtmlNode In tables.Descendants("tr").ToArray
tr.AppendChild(div1)
tr.AppendChild(newLineNode)
tr.AppendChild(div2)
Next
However tables.Descendants("tr").ToArray
did provide a compile error for me. As that's out of the scope of this question and you haven't raised it as an issue I'll make an assumption that it works for you.