I got HTML I need to parse, and I'm using C# and Html Agility Pack Library to do the selection of nodes. My html will look something like either:
<input data-translate-atrr-placeholder="FORGOT_PASSWORD.FORM.EMAIL">
or :
<h1 data-translate="FORGOT_PASSWORD.FORM.EMAIL"></h1>
where data-translate-attr-****
is the new pattern of attributes I need to find
I could use something like this :
//[contains(@??,'data-translate-attr')]
but unfortunately, that will only search for value INSIDE an attribute. How do I look for the attribute itself, with a wildcard?
Update : @Mathias Muller
HtmlAgilityPack.HtmlDocument htmlDoc
// this is the old code (returns nodes)
var nodes = htmlDoc.DocumentNode.SelectNodes("//@data-translate");
// these suggestions return no nodes using the same data
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[contains(name(),'data-translate')]");
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[starts-with(name(),'data-translate')]");
Update 2
This appears to be an Html Agility Pack issue more than an XPath issue, I used chrome to test my XPath expressions and all of the following worked in chrome but not in Html Agility Pack :
//@*[contains(local-name(),'data-translate')]
//@*[starts-with(name(),'data-translate')]
//attribute::*[starts-with(local-name(.),'data-translate')]
My Solution
I ended up just doing things the old fashioned way...
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*");
if (nodes != null) {
foreach (HtmlNode node in nodes) {
if (node.HasAttributes) {
foreach (HtmlAttribute attr in node.Attributes) {
if (attr.Name.StartsWith("data-translate")) {
// code in here to handle translation node
}
}
}
}
}
Use the XPath functions contains()
or starts-with()
. You need an XPath expression like
//@*[contains(name(),'data-translate')]
or perhaps
//@*[starts-with(name(),'data-translate')]
which actually retrieves attribute nodes. Above, the @*
is the attribute wildcard you were looking for.
rather than using name()
, use local-name()
such as:
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[starts-with(local-name(),'data-translate')]");
the difference is that name()
should give you the attribute name with a prefix such as a namespace in xml, and local-name()
will emit that prefix if its there, in your case name()
and local-name()
should work the same way because its html and there are no namespaces, but it seems that they don't and its probably a bug.
Test:
var html = "<h3 x='foo'></h3>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var ElementByName = doc.DocumentNode.SelectSingleNode("//*[name()='h3']"); //Works
var ElementByLocalName = doc.DocumentNode.SelectSingleNode("//*[local-name()='h3']"); //Works
var ElementByAttributeLocalName = doc.DocumentNode.SelectSingleNode("//*[@*[local-name()='x']]"); //Works
var ElementByAttributeName = doc.DocumentNode.SelectSingleNode("//*[@*[name()='x']]"); //Does NOT
//Mathias Way
var ElementByAttributeLocalName_ = doc.DocumentNode.SelectSingleNode("//@*[local-name() = 'x']"); //Works
var ElementByAttributeName_ = doc.DocumentNode.SelectSingleNode("//@*[name() = 'x']"); //Does NOT