How to get all input elements in a form with HtmlAgilityPack without getting a null reference error

c# html html-agility-pack parsing

Question

Example HTML:

 <html><body>
     <form id="form1">
       <input name="foo1" value="bar1" />
       <!-- Other elements -->
     </form>
     <form id="form2">
       <input name="foo2" value="bar2" />
       <!-- Other elements -->
     </form>   
 </body></html>

Test code:

HtmlDocument doc = new HtmlDocument();
doc.Load(@"D:\test.html");
foreach (HtmlNode node in doc.GetElementbyId("form2").SelectNodes(".//input"))
{
    Console.WriteLine(node.Attributes["value"].Value);            
}

The statement doc.GetElementbyId("form2").SelectNodes(".//input") gives me a null reference.

Anything I did wrong? thanks.

Accepted Answer

You can do the following:

HtmlNode.ElementsFlags.Remove("form");

HtmlDocument doc = new HtmlDocument();

doc.Load(@"D:\test.html");

HtmlNode secondForm = doc.GetElementbyId("form2");

foreach (HtmlNode node in secondForm.Elements("input"))
{
    HtmlAttribute valueAttribute = node.Attributes["value"];

    if (valueAttribute != null)
    {
        Console.WriteLine(valueAttribute.Value);
    }
}

By default HTML Agility Pack parses forms as empty node because they are allowed to overlap other HTML elements. The first line, (HtmlNode.ElementsFlags.Remove("form");) disables this behavior allowing you to get the input elements inside the second form.

Update: Example of form elements overlap:

<table>
<form>
<!-- Other elements -->
</table>
</form>

The element begins inside a table but is closed outside the table element. This is allowed in the HTML specification and HTML Agility Pack has to deal with it.


Popular Answer

Just get them in array:

HtmlNodeCollection resultCollection = doc.DocumentNode.SelectNodes("//*[@type='text']");



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why