How to use HtmlAgilityPack to obtain all input items in a form without receiving a null reference error

c# html html-agility-pack parsing

Question

Example HTML:

 <html><body>
     <form id="form1">
       <input name="foo1" value="bar1" />
       <!-- Other elements -->
     </form>
     <form id="form2">
       <input name="foo2" value="bar2" />
       <!-- Other elements -->
     </form>   
 </body></html>

Test code:

HtmlDocument doc = new HtmlDocument();
doc.Load(@"D:\test.html");
foreach (HtmlNode node in doc.GetElementbyId("form2").SelectNodes(".//input"))
{
    Console.WriteLine(node.Attributes["value"].Value);            
}

The statement doc.GetElementbyId("form2").SelectNodes(".//input") gives me a null reference.

Anything I did wrong? thanks.

1
23
2/12/2016 4:00:18 PM

Accepted Answer

You can do the following:

HtmlNode.ElementsFlags.Remove("form");

HtmlDocument doc = new HtmlDocument();

doc.Load(@"D:\test.html");

HtmlNode secondForm = doc.GetElementbyId("form2");

foreach (HtmlNode node in secondForm.Elements("input"))
{
    HtmlAttribute valueAttribute = node.Attributes["value"];

    if (valueAttribute != null)
    {
        Console.WriteLine(valueAttribute.Value);
    }
}

By default HTML Agility Pack parses forms as empty node because they are allowed to overlap other HTML elements. The first line, (HtmlNode.ElementsFlags.Remove("form");) disables this behavior allowing you to get the input elements inside the second form.

Update: Example of form elements overlap:

<table>
<form>
<!-- Other elements -->
</table>
</form>

The element begins inside a table but is closed outside the table element. This is allowed in the HTML specification and HTML Agility Pack has to deal with it.

43
3/5/2010 4:20:27 PM

Popular Answer

Just get them in array:

HtmlNodeCollection resultCollection = doc.DocumentNode.SelectNodes("//*[@type='text']");


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow