XPath to HtmlAgilityPack re:test() (get all p tags with matched regex internal)

c# html html-agility-pack xpath


I want all <p>=.+=</p> tags. The Regex works on its own, without the <p> tags.

Here's my XPath: "//p[re:test(.,'^=.+=$', 'i')]"

But I'm getting an exception when I plug it into,

HtmlNodeCollection pNodes = htmlDoc.DocumentNode.SelectNodes("//p[re:test(.,'^=.+=$', 'i')]");

The exception is:

Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function.

Edit: The Html is generated by FCKEditor and has no namespace defined. Do I need to set something for this to work?


<p><style type="text/css">
h2 a { color: black; }</style></p>
<h2>test <a href="http://searisen.com">link</a></h2>
<p>== Heading 2 ==</p>
<p>=== Heading [http://searisen.com SeaRisen.com] ===</p>
5/20/2011 12:09:58 AM

Accepted Answer

The error you have is due to the fact that the expression re:test uses an XPATH function named test (declared in a namespace whose prefix is re), that is unknown to the XSLT context.

I don't know where you got that expression from, but it's not standard, so it means nothing in the Html Agility Pack context :-)

For indepth explanation, see this cool article here: Adding Custom Functions to XPath. Note you could make it work using these techniques.

That said, here a "pure" Html Agility Pack / XPATH implementation:

var pNodes = htmlDoc.DocumentNode.SelectNodes("//p[text()='=.+=']");

It uses a filter (between [ and ]) and the standard XPATH function text() which means "inner text".

5/20/2011 7:35:50 AM

Popular Answer

Apparently HtmlAgilityPack doesn't handle namespaces (not that I had one). So I've come up with this hack,

var pNodes = htmlDoc.DocumentNode.SelectNodes("//p")
    .Where(node => Regex.Match(node.InnerText, "^=.+=$").Success);

If there is an HtmlAgilityPack solution I'd love to hear it!

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow