Unable to figure out XPath in HtmlAgilityPack

c# html-agility-pack xpath

Question

I have trying to get around making my first C# application(that can do more than just say "Hello world"),

now the html file got lots of tags,(but got only two h4 tags that are given below.) but here is the part that i am interested in:

<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">

<tbody valign="top">
<tr>
<td>

<table width="80%" border="0" valign=top background="images/page_bg.gif">
 <tr>
 <td>

  <div align="center">
   <h4 align="center">
      <font face="Verdana, Arial, Helvetica, sans-serif" size="2">
      <b>
      <font size="4" face="Arial, Helvetica, sans-serif">
      UNWANTED TEXT
       </font></b></font></h4>

  <p><br />
  Name  :  {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO.  :  <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>                    
  </p>
  <p class="content"><em><strong>
  <p>&nbsp;</p>

I wish to get NAME,Numbers1,Numbers2,Numbers3, So, i guess i got to do something like this =

 //div[@align = "centre"]/h4/followingsibling::Text();

but surely it is incomplete, any ideas on how should i do it, I got the Xpath from firebug : /html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/tbody/tr/td/div/h4

i have also tried doing(for just getting the raw data first and then trimming it further)

 HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
            foreach(HtmlNode node1 in node)    
                textBox1.Text += node1.InnerText;

But the Node here is passed on as NULL Any help is greatly appreciated.

Accepted Answer

Firefox adds tbody tag to table (in original html this tag can be absent). So, I would suggest do not write all path, find most characterizing path and use //. For example, //div[@class='data']/table//tr/td


Popular Answer

Did you notice that you have @align="centre" but the HTML has align="center" (as in, British vs US spelling)?



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow