Parent of htmlagilitypack text node is select instead of option?

html-agility-pack

Question

Using htmlagility, I am searching for text nodes in a dom structure consisting of a select.

<select>
  <option>
    one
  </option>
  <option>
    two
  </option>
</select>

Those nodes parents seems to be the

<select>

instead of an

<option>

Why?

using System.IO;
using System.Linq;
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace Foo.Test
{
  [TestClass]
  public class HtmlAgilityTest
  {
    [TestMethod]
    public void TestTraverseTextNodesInSelect()
    {
      var html = "<select><option>one</option><option>two</option></select>";

      var doc = new HtmlDocument();
      doc.Load(new StringReader(html));

      var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");

      Assert.AreEqual(2, elements.Count());
      Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
      Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
    }
  }
}

Accepted Answer

That's because HtmlAgilityPack drop closing <option> tag by default. HAP sees your HTML like this :

Console.WriteLine(doc.DocumentNode.OuterHtml);
//result :
//<select><option>one<option>two</select>

And as mentioned in the linked question above, you can alter that behavior by calling following line before initiating the HtmlDocument :

HtmlNode.ElementsFlags.Remove("option");

Popular Answer

   [TestMethod]
    public void TestTraverseTextNodesInSelect()
    {
      HtmlNode.ElementsFlags.Remove("option");
      var html = "<select><option>one</option><option>two</option></select>";

      var doc = new HtmlDocument();
      doc.Load(new StringReader(html));

      var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");

      Assert.AreEqual(2, elements.Count());
      Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
      Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
    }

you can try with this.

In the library it has like this. You need to remove it. by default the AgilityPack is set to treat option tags as empty.

ElementsFlags.Add("option", HtmlElementFlag.Empty);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why