Trouble selecting nodes with Html Agility Pack

c# html html-agility-pack

Question

I have the current HTML layout

<table> //table[1]
</table>
<table> //table[2]
<tbody>
   <tr>
      <td>
         <p>
            &nbsp;
         </p>
      </td>
   </tr>
   <tr>
      <td>
         <table> //table[1]//table[1]
            <tbody>
               <tr>
                  <td>
                     <p>
                        INFO 1
                     </p>
                  </td>
                  <td>
                     <p>
                        INFO 2
                     </p>
                  </td>
                  <td>
                     <p>
                        INFO 3
                     </p>
                  </td>
                  <td>
                     <p>
                        INFO 4
                     </p>
                  </td>
               </tr>
            </tbody>
         </table>
      </td>
   </tr>
   <tr>
      <td>
         <table> //table[1]//table[2]
            <tbody>
               <tr>
                  <td>
                     <p><strong>Name</strong></p>
                  </td>
                  <td>
                     <p><strong>Quantity</strong></p>
                  </td>
               </tr>
               <tr>
                  <td>
                     <p>Apples </p>
                  </td>
                  <td>10</td>
               </tr>
            </tbody>
         </table>
      </td>
   </tr>
   <tr>
      <td>
         <table>  //table[1]//table[3]
         </table>
      </td>
   </tr>
</tbody>
</table>

I am trying to get the data within //table[1]//table[2], yet I keep getting a null HtmlNode (System.NullReferenceException) for the following:

doesn't' work: doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr//td//table[2]//tbody//tr");,

I am not sure why this occurs as when I try to get data for //table[1]//table[1] it works just fine with this syntax

works: doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr//td//table[1]//tbody//tr");

Am I misunderstanding how the indexing works with Html Agility Pack?

Accepted Answer

//table[2] return 2nd <table> element within the same parent because in XPath :

The ([]) has a higher precedence (priority) than (// and /). [For Reference]

In your case, there is only one <table> in each <td>, therefore the Xpath expression returned nothing. One possible solution is to put brackets to alter the precedence :

(//table[2]//tbody//tr//td//table)[2]//tbody//tr

Above Xpath get 2nd <table> element from all <table>s returned by the inner XPath //table[2]//tbody//tr//td//table. Then from that <table>, continue to return descendants //tbody//tr elements.


Popular Answer

I ended up having to base this off of tr's not sure why my other way did not work, but this way does work.

I basically moved my indexing to the next level above my table's. So within the first tbody each table thereafter is within a tr/td statement, and I simply I constructed my HtmlNode to index off of the tr's. Maybe Agility Pack works better if you broaden the selecting process? IDK.

Anyways...

For table[2]//table[1] I used:

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr[2]//table");
foreach (var cell in table.SelectNodes(".//tr//td/p"))
...

I Selected tr[2] as I had a tr/td before with a blank space if you note the example HTML above

For table[2]//table[2] I used

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr[3]//table[1]");
foreach (var cell in table.SelectNodes(".//tr//td"))
...

For anyone having issues, try moving your search to a broader selection by pushing specific tags to broader ones.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why