'HTML Agility Pack' XPath query with logical AND

c# html-agility-pack xpath

Question

I'm trying to find a table in a HTML document with the first 2 rows containing 3 columns with text in.

I have experimented trying to use the following query, which I want to return the node that has the first 2 rows of the table contain text in the first column:

string xpath = @"//table//table[//tr[1]//td[1]//*[contains(text(), *)] and //tr[2]//td[1]//*[contains(text(), *)]]";
HtmlNode temp = doc.DocumentNode.SelectSingleNode(xpath);

It doesn't work properly, mon.

Here is some sample HTML, which is the table I'm trying to match:

    <table width="100%" cellpadding="0" border="0">
       <tbody>
       <tr>
          <td width="27%" valign="center"><b><font size="1" face="Helvetica">SOME TEXT<br></font></b></td>
          <td width="1%"></td>
          <td width="9%" valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td width="1%"></td>
          <td width="25%" valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td width="37%"></td>
       </tr>
       <tr>
          <td valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td></td>
          <td valign="center"><font size="1" face="Helvetica">1<br></font></td>
          <td></td>
          <td valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td></td>
       </tr>
       </tbody>
</table>

You notice the columns 1,3,5 have text in the first 2 rows. That's what I'm trying to match.

Accepted Answer

//table//table[//tr[1]//td[1]//*[contains(text(), *)] and //tr[2]//td[1]//*[contains(text(), *)]]

There are many problems with this XPath expression:

  1. //table//table selects any table that is a descendant of a table. However, in the provided XML document there are no nested tables.

  2. table[//tr[1]//td[1]//*[contains(text(), *)] . The //tr inside the predicate is an absolute Xpath expression -- it selects all tr elements in the whole document -- not only in the subtree rooted by this table element. Most probably you want .//tr instead of //tr.

  3. //td[1] selects any td element that is the first td child of its parent -- but most probably you want only the first descendant td element. If so, you need to use this XPath expression: (//td)[1]

  4. //*[contains(text(), *)] this selects any element whose first text node child contains the string value of the first element child -- but you simply want to verify that a td has a descendant text child node -- this can correctly be selected with: td[.//text()]

Combining the corrections of all these issues, what you probably want is something like:

  //table
     [(.//tr)[1]/td[1][.//text()]
    and
      (.//tr)[2]/td[1][.//text()]
     ]

Alternatively, one could write an equivalent but more understandable and less error-prone expression like this:

//table
  [descendant::tr[1]/td[1][descendant::text()]
 and
   descendant::tr[1]/td[1][descendant::text()]
  ]



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why