Using XPath to select attributes with wildcards

c# html html-agility-pack xpath

Question

I got HTML I need to parse, and I'm using C# and Html Agility Pack Library to do the selection of nodes. My html will look something like either:

<input data-translate-atrr-placeholder="FORGOT_PASSWORD.FORM.EMAIL">

or :

<h1 data-translate="FORGOT_PASSWORD.FORM.EMAIL"></h1>

where data-translate-attr-**** is the new pattern of attributes I need to find

I could use something like this :

//[contains(@??,'data-translate-attr')]

but unfortunately, that will only search for value INSIDE an attribute. How do I look for the attribute itself, with a wildcard?

Update : @Mathias Muller

HtmlAgilityPack.HtmlDocument htmlDoc    
// this is the old code (returns nodes)
var nodes = htmlDoc.DocumentNode.SelectNodes("//@data-translate");  
// these suggestions return no nodes using the same data
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[contains(name(),'data-translate')]");  
var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[starts-with(name(),'data-translate')]");

Update 2

This appears to be an Html Agility Pack issue more than an XPath issue, I used chrome to test my XPath expressions and all of the following worked in chrome but not in Html Agility Pack :

//@*[contains(local-name(),'data-translate')]
//@*[starts-with(name(),'data-translate')]
//attribute::*[starts-with(local-name(.),'data-translate')]

My Solution

I ended up just doing things the old fashioned way...

var nodes = htmlDoc.DocumentNode.SelectNodes("//@*");

if (nodes != null) {
    foreach (HtmlNode node in nodes) {
        if (node.HasAttributes) {
            foreach (HtmlAttribute attr in node.Attributes) {
                if (attr.Name.StartsWith("data-translate")) {
                    // code in here to handle translation node
                }
            }
        }
    }
}

Accepted Answer

Use the XPath functions contains() or starts-with(). You need an XPath expression like

//@*[contains(name(),'data-translate')]

or perhaps

//@*[starts-with(name(),'data-translate')]

which actually retrieves attribute nodes. Above, the @* is the attribute wildcard you were looking for.


Popular Answer

rather than using name(), use local-name() such as:

var nodes = htmlDoc.DocumentNode.SelectNodes("//@*[starts-with(local-name(),'data-translate')]");

the difference is that name() should give you the attribute name with a prefix such as a namespace in xml, and local-name() will emit that prefix if its there, in your case name() and local-name() should work the same way because its html and there are no namespaces, but it seems that they don't and its probably a bug.

Test:

    var html = "<h3 x='foo'></h3>";
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    var ElementByName = doc.DocumentNode.SelectSingleNode("//*[name()='h3']");                //Works
    var ElementByLocalName = doc.DocumentNode.SelectSingleNode("//*[local-name()='h3']");     //Works
    var ElementByAttributeLocalName = doc.DocumentNode.SelectSingleNode("//*[@*[local-name()='x']]"); //Works
    var ElementByAttributeName = doc.DocumentNode.SelectSingleNode("//*[@*[name()='x']]");  //Does NOT

    //Mathias Way
    var ElementByAttributeLocalName_ = doc.DocumentNode.SelectSingleNode("//@*[local-name() = 'x']"); //Works
    var ElementByAttributeName_ = doc.DocumentNode.SelectSingleNode("//@*[name() = 'x']");  //Does NOT



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why