How can I find the first element with a plain text inner text of 200 characters or greater, excluding any children?
I've built up a system of fallbacks where I initially look for Embed.ly when attempting to develop an HTML parser similar to that one.
then and only then would I look for this event
This is due to the fact that most websites even
Instead of listing the information on the current page, that tag should explain their website.
<html> <body> <div>some characters <p>200 characters <span>some more stuff</span></p> </div> </body> </html>
What selector might I use in order to get the 200 text messages section of that HTML fragment? The a few more things is something I also don't want, and I don't care what element it is (except for
), provided that the initial plain text is at least 200 characters long.
What structure should the XPath query have?
(//*[not(self::script or self::style)]/text()[string-length() > 200])
The following expression should be used if the document is an XHTML document, which implies that all elements are in the xhrml namespace:
(//*[not(self::x:script or self::x:style)]/text()[string-length() > 200])
in which the prefix
the namespace for XHTML must be constrained —
(Or, as many XPath APIs refer to it: the namespace must begin with "Registered").
I had something like this in mind:
root.SelectNodes("html/body/.//*[(name() !='script') and (name()!='style')]/text()[string-length() > 200]")
seems to function quite well.