The HTML Agility Pack eliminates the break tag.

asp.net html-agility-pack vb.net

Question

I'm using the HTML Agility Pack to create an HTML document. I add material after loading a template file. Everything is functional, however when I see the output file, the closing tag from my<br/> tags that like this<br> . What is the root of this?

Dim doc As New HtmlDocument()
doc.Load(Server.MapPath("Template.htm"))

Dim title As HtmlNode = doc.DocumentNode.SelectSingleNode("//title")

title.InnerHtml = title.InnerHtml & "CEU Classes"
Dim topContent As HtmlAgilityPack.HtmlNode = doc.GetElementbyId("topContent")

topContent.InnerHtml = html.ToString
doc.OptionWriteEmptyNodes = True
doc.Save(outputFileName, Encoding.UTF8)

More details

After I inserted my closing image tags, it began erasing them.doc.OptionWriteEmptyNodes = True It stopped doing that.

Update

This is the current state of my code to remove the ending BR tag.

Dim html As String = "Words<br/>more words"
Dim doc As New HtmlDocument()
Dim title As HtmlNode
Dim topContent As HtmlNode

HtmlNode.ElementsFlags("br") = HtmlElementFlag.Empty
doc.Load(Server.MapPath("Template.htm"))

Title = doc.DocumentNode.SelectSingleNode("//title")
title.InnerHtml = title.InnerHtml & "CEU Classes"

topContent = doc.GetElementbyId("topContent")
topContent.InnerHtml = html.ToString

doc.OptionWriteEmptyNodes = True
doc.Save(outputFileName, Encoding.UTF8)

2. Update

In the end, I merely loaded the html by reading in my template file as a regular string.

Dim TemplateHTML As String = File.ReadAllText(Server.MapPath("Template.htm"))

TemplateHTML = TemplateHTML.Insert(TemplateHTML.IndexOf("<div id=""topContent"">") + "<div id=""topContent"">".Length, _
                                   html.ToString)

doc.LoadHtml(TemplateHTML)
1
17
4/27/2011 3:04:42 PM

Accepted Answer

This occurs as a result of the unique handling of the BR by the HTML Agility Pack. It continues to support the outdated HTML 3.2 syntax, which is still used on the internet today and allows the BR to be stated without using any closing tags (browsers also still handle it gracefully by the way...).

You must alter the default behavior to change it.HtmlNode.ElementFlags property such as:

Dim doc As New HtmlDocument()
HtmlNode.ElementsFlags("br") = HtmlElementFlag.Empty
doc.LoadHtml("<test>before<br/>after</test>")
doc.OptionWriteEmptyNodes = True   
doc.Save(Console.Out)

that will be displayed:

<test>before<br />after</test>
21
12/11/2013 4:44:15 PM

Popular Answer

The following C# code, according to @Simon Mourier, is functional in version 1.4.

var doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.OptionWriteEmptyNodes = true;
doc.LoadHtml("Lorem ipsum dolor sit<br/>Lorem ipsum dolor sit");

var postParsed = doc.DocumentNode.WriteTo();

has the string value shown below for postParsed

"Lorem ipsum dolor sit<br />Lorem ipsum dolor sit"


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow