I am using HTML Agility Pack to select an element and return that element and everything it contains from an html string that is loaded. In testing my code, I ran it against the select tag example from w3schools:
<select name="cars"> <option value="volvo">Volvo XC90</option> <option value="saab">Saab 95</option> <option value="mercedes">Mercedes SLK</option> <option value="audi">Audi TT</option> </select>
When I try to select and return this with HTML agility pack, I get (option closing tags removed):
<select name="cars"> <option value="volvo">Volvo XC90 <option value="saab">Saab 95 <option value="mercedes">Mercedes SLK <option value="audi">Audi TT </select>
So I did some searching here and found an instruction to add the line: HtmlNode.ElementsFlags.Remove("option");
I did that, and now I get (the options text is moved outside of the option tags):
<select name="cars"> <option value="volvo"></option>Volvo XC90 <option value="saab"></option>Saab 95 <option value="mercedes"></option>Mercedes SLK <option value="audi"></option>Audi TT </select>
I would like the output to match the original HTML. What do I need to do to get that?
I was also playing with the OptionWriteEmptyNodes as when I tested with input tags their self closing was being removed, adding that option seemed to fix that. I commented it out now to make sure it wasn't impacting this issue.
This is my .NET C# code:
HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(content); HtmlNode.ElementsFlags.Remove("option"); // otherwise, the closing tag is removed. //doc.OptionWriteEmptyNodes = true; var nodes = doc.DocumentNode.SelectNodes("//select"); if (nodes == null) return "Not found"; else return nodes.OuterHtml;
You need to set the ElementsFlag field for the option tag to make it work
HtmlNode.ElementsFlags["option"] = HtmlElementFlag.Closed; HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html);
which should return your original HTML code.
I believe the reason that HtmlAgilityPack behaves this way is because the
<option>-tag is ironically an optional tag in HTML that doesn't require a closing tag.
Taken from the documentation of the
HtmlNode class and it's field
Gets a collection of flags that define specific behaviors for specific element nodes. The table contains a DictionaryEntry list with the lowercase tag name as the Key, and a combination of HtmlElementFlags as the Value.
Further look into the
HtmlElementFlag enums reveal this:
Empty - The node is empty. META or IMG are example of such nodes. Closed - The node will automatically be closed during parsing.
You can view the source code for the class HtmlNode to see what other tags are considered 'specific'.