node.Descendants(0) seems to return all child nodes instead of first level

.net html-agility-pack

Question

I'm going through a document tree one level at a time using HTML Agility Pack. But it seems that phoningnode.Descendants(0) a complete node tree is returned.

The SE parser didn't appreciate my literal HTML string when I attempted copying it in, so I inserted it as a fragment instead.

<html>
    <head>
    <meta name="generator"
    content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy-html5/tree/c63cc39" />
    <title></title>
    </head>
    <body>
    <p id="p1" class="newline">
        <span id="span1" class="bold">
        <span id="span2" class="literal">BOLD TEXT</span>
        </span>
    </p>
    </body>
</html>
var doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(html);

var lines = doc.DocumentNode.Descendants().Where(x => x.HasClass("newline")).ToArray();

Console.WriteLine(string.Join("\r\n", lines[0].Descendants(0)
    .Select(x => $"{x.Name} {x.Id} {(x as HtmlTextNode)?.Text}")));

The aforementioned code retrieves the firstp descendants of tag. If I succeed0 or 1 It accepts the complete node tree as an input and prints the following. There is a problem since the text node includesBOLD TEXT is nested three layers under thep tag. I would only anticipate a text node to be returned by the code above.span1 followed by yet another text node.

What am I doing incorrectly when I call.Descendants ?

#text

span span1
#text

span span2
#text  BOLD TEXT
#text

#text

Edit: Making sure you only get descendants whose parents are equal to the current node is a temporary fix. But I'm still seeking for a more workable answer.

Console.WriteLine(string.Join("\r\n", lines[0].Descendants(0)
    .Where(x => x.ParentNode == lines[0])
    .Select(x => $"{x.Name} {x.Id} {(x as HtmlTextNode)?.Text}")));
1
0
7/4/2018 9:43:19 PM

Popular Answer

I had the same problem, went online, and discovered your query:). I zzz-five-zzzed after that. And now for a condensed version of the response:

Code dictates a distinct behavior for it:

/// <summary>
/// Gets all Descendant nodes in enumerated list
/// </summary>
/// <returns></returns>
public IEnumerable<HtmlNode> Descendants(int level)
{
    if (level > HtmlDocument.MaxDepthLevel)
    {
        throw new ArgumentException(HtmlNode.DepthLevelExceptionMessage);
    }

    foreach (HtmlNode node in ChildNodes)
    {
        yield return node;

        foreach (HtmlNode descendant in node.Descendants(level + 1))
        {
            yield return descendant;
        }
    }
}

When there are no more descendants or the maximum level is achieved, it takes all descendants and all offspring by raising the level by one (int.MaxValue). I agree with you, however, that it should most likely return descendent up until the desired threshold is achieved. Unfortunately, we won't likely do anything with this way to ensure backward compatibility so as to not interfere with present applications.

However, in this instance,ChildNodes may be substituted withDescendants(0) . the code will seem to be:

    Console.WriteLine(string.Join("\r\n", lines[0].ChildNodes
                            .Select(x => $"{x.Name} {x.Id} {(x as HtmlTextNode)?.Text}")));
0
7/24/2018 9:36:05 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow