Parent of htmlagilitypack text node is select instead of option?

html-agility-pack

Question

I'm looking for text nodes in a select-based dom structure using htmlagility.

<select>
  <option>
    one
  </option>
  <option>
    two
  </option>
</select>

Parents of those nodes seem to be the

<select>

as opposed to an

<option>

Why?

using System.IO;
using System.Linq;
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace Foo.Test
{
  [TestClass]
  public class HtmlAgilityTest
  {
    [TestMethod]
    public void TestTraverseTextNodesInSelect()
    {
      var html = "<select><option>one</option><option>two</option></select>";

      var doc = new HtmlDocument();
      doc.Load(new StringReader(html));

      var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");

      Assert.AreEqual(2, elements.Count());
      Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
      Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
    }
  }
}
1
1
7/22/2014 11:17:46 AM

Accepted Answer

That is due to Drop closing HtmlAgilityPack<option> tag automatically. HAP interprets your HTML as follows:

Console.WriteLine(doc.DocumentNode.OuterHtml);
//result :
//<select><option>one<option>two</select>

Additionally, as noted in the related query up above, you may change that behavior by calling the following line before to starting theHtmlDocument :

HtmlNode.ElementsFlags.Remove("option");
1
5/23/2017 10:25:04 AM

Popular Answer

   [TestMethod]
    public void TestTraverseTextNodesInSelect()
    {
      HtmlNode.ElementsFlags.Remove("option");
      var html = "<select><option>one</option><option>two</option></select>";

      var doc = new HtmlDocument();
      doc.Load(new StringReader(html));

      var elements = doc.DocumentNode.Descendants().Where(n=>n.Name == "#text");

      Assert.AreEqual(2, elements.Count());
      Assert.AreEqual("select", elements.ElementAt(0).ParentNode.Name);
      Assert.AreEqual("select", elements.ElementAt(1).ParentNode.Name);
    }

You may give it a go.

It is like this at the library. You must take it out. The AgilityPack is configured by default to consider option tags as empty.

ElementsFlags.Add("option", HtmlElementFlag.Empty);


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow