Here is the html snippet. How do I get normalize-space text ?
To search for the same fragment, but using Xpath //*[normalize-space()='Text1 Text2']
<div>
<div>
<a></a>
<a></a>
<div><a><span></span>Text2</a></div>
</div>
<div>Text2</div>
</div>
Using:
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("*");
foreach (var node in htmlNodes)
{
text += node.InnerText;
}
I get this string:
"\r\n \r\n \r\n \r\n \r\n Text1\r\n Text2"
Can I get a normal text ?
"Text1 Text2"
You can use InnerText
property instead:
var texts = document.DocumentNode.Descendants("div").Select(n => n.InnerText);
And combine them if you want:
var combined = string.Join(" ", texts);
To filter out empty values:
.Select(n => n.InnerText.Replace("\r\n", "")).Where(s => !string.IsNullOrEmpty(s));