Using htmlagility, I am searching for text nodes in a dom structure consisting of a select.
<select>
<option>
one
</option>
<option>
two
</option>
</select>
Those nodes parents seems to be the
<select>
instead of an
<option>
Why?
using System.IO;
using System.Linq;
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace Foo.Test
{
[TestClass]
public class HtmlAgilityTest
{
[TestMethod]
public void TestTraverseTextNodesInSelect()
{
var html = "<select><option>one</option><option>two</option></select>";
var doc = new HtmlDocument();
doc.Load(new StringReader(html));
var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");
Assert.AreEqual(2, elements.Count());
Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
}
}
}
That's because HtmlAgilityPack drop closing <option>
tag by default. HAP sees your HTML like this :
Console.WriteLine(doc.DocumentNode.OuterHtml);
//result :
//<select><option>one<option>two</select>
And as mentioned in the linked question above, you can alter that behavior by calling following line before initiating the HtmlDocument
:
HtmlNode.ElementsFlags.Remove("option");
[TestMethod]
public void TestTraverseTextNodesInSelect()
{
HtmlNode.ElementsFlags.Remove("option");
var html = "<select><option>one</option><option>two</option></select>";
var doc = new HtmlDocument();
doc.Load(new StringReader(html));
var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");
Assert.AreEqual(2, elements.Count());
Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
}
you can try with this.
In the library it has like this. You need to remove it. by default the AgilityPack is set to treat option tags as empty.
ElementsFlags.Add("option", HtmlElementFlag.Empty);