How to get inner text from span which include other hidden span?

c# html html-agility-pack


I have some test html page

<!DOCTYPE html>
<html lang="en" xmlns="">
    <meta charset="utf-8" />
    <title>Page for test</title>
    <div class="r_tr">
        <span class="r_rs">Inner text<span class="otherSpan" style="display: none">text</span></span>

I want to get "Inner text". I am using HtmlAgilityPack. I write this method

public string GetInnerTextFromSpan(HtmlDocument doc)
    const string rowXPath = "//*[@class=\"r_tr\"]";
    const string spanXPath = "//*[@class=\"r_rs\"]";
    string text = null;
    HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(rowXPath);
    foreach(HtmlNode row in rows)
        text = row.SelectSingleNode(spanXPath).InnerText;
        Console.WriteLine("textL {0}", text);
    return text;


but this method return "Inner texttext". I write some unit test for explain my problem

public void TestGetInnerTextFromSpan()
    var client = new PromtTranslatorClient();
    var doc = new HtmlDocument();
    var text = client.GetInnerTextFromSpan(doc);
    StringAssert.AreEqualIgnoringCase("Inner text", text);

and result

Expected string length 10 but was 14. Strings differ at index 10.
  Expected: "Inner text", ignoring case
  But was:  "Inner texttext"
10/10/2012 10:18:38 AM

Accepted Answer

I do not know XPath but here is solution using LINQ:

String inner = (from x in doc.DocumentNode.Descendants()
                where x.Name == "span"
                && x.Attributes["class"].Value == "r_rs"
                      (from y in x.ChildNodes
                       where y.Name == "#text"
                       select y.InnerText).FirstOrDefault()
10/10/2012 10:42:35 AM

Popular Answer

First, your spanXPath is incorrect. // at the start means "start from the root", so row.SelectSingleNode(spanXPath) will always give the first element with class r_rs in the document, not in the row. Drop the // to fix this.

Then, text() is the XPath for a text node. You can use

var span = row.SelectSingleNode(spanXPath);
var textNode = span.SelectSingleNode("text()");
text = textNode.InnerText;
Console.WriteLine("textL {0}", text);

in your foreach loop to get the first text node in the selected span.

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow