Html Agility Pack InnerHtml returns incorrect string with textboxes

html-agility-pack innerhtml

Question

The following test code:

[Test]
public void PossibleHtmlAgilityPackBug()
{
    const string html = @"<input type=""text"" name=""shouldNotTrim"" />";
    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    Assert.That(doc.DocumentNode.InnerHtml, Is.EqualTo(html));
}

Results in:

Expected string length 42 but was 40. Strings differ at index 39.
Expected: "<input type="text" name="shouldNotTrim" />"
But was:  "<input type="text" name="shouldNotTrim">"
--------------------------------------------------^

Is this a bug? Or is there a config that I can change to output that extra "/" I need?

Thanks,

Chi

Accepted Answer

This is not a bug. INPUT is considered by the parser as an "empty" element (see this for example: HTMLAgilityPack don't preserves original empty tags on the empty elements subjects), and by default, such elements are rendered without the closing /.

The reasons are historically related to HTML 3.2. Back in those days, INPUT was not required to be closed, although it looks like like a bug today.

This will fix your problem:

public void PossibleHtmlAgilityPackBug()
{
    const string html = @"<input type=""text"" name=""shouldNotTrim"" />";
    var doc = new HtmlDocument();
    doc.OptionWriteEmptyNodes = true;
    doc.LoadHtml(html);

    Assert.That(doc.DocumentNode.InnerHtml, Is.EqualTo(html));
}

As a side note, the HTML agility pack will not always create an exact equivalent of the html text, but it will always try to rebuild something that will be rendered the same way. Browsers support an unclosed INPUT without a problem.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why