Parse HTML table with HtmlAgilityPack?

html-agility-pack vb.net

Question

I'm tearing my hair out trying to figure out this HTML agility pack business. No examples I can find work with my table not matter what I modify. Here's the table I'm working with:

<td class="trow1"><strong><a href="NEED1"><span style="color:#383838">NEED2</span></a></strong></td>
<td class="trow1">NEED3</td>
<td class="trow1" align="center"" alt="" /></td>
<td class="trow1" align="center"><strong>NEED4</strong></td>
</tr><tr>
<td class="trow2"><strong><a href="NEED1"><span class="group9">NEED2</span></a></strong></td>
<td class="trow2">NEED3</td>
<td class="trow2" align="center"" alt="" /></td>
<td class="trow2" align="center"><strong>NEED4</strong></td>
</tr><tr>
<td class="trow1"><strong><a href="NEED1"><span class="group0">NEED2</span></a></strong></td>
<td class="trow1">NEED3</td>
<td class="trow1" align="center"" alt="" /></td>
<td class="trow1" align="center"><strong>NEED4</strong></td>
</tr><tr>
<td class="trow2"><strong><a href="NEED1"><span class="group7">NEED2</span></a></strong></td>
<td class="trow2">NEED3</td>
<td class="trow2" align="center"" alt="" /></td>
<td class="trow2" align="center"><strong>NEED4</strong></td>
</tr><tr>
<td class="trow1"><strong><a href="NEED1"><span class="group0">NEED2</span></a></strong></td>
<td class="trow1">NEED3</td>
<td class="trow1" align="center"" alt="" /></td>
<td class="trow1" align="center"><strong>NEED4</strong></td>
</tr>

I've replaced what I need with "NEED"1->4 for each row. I'm looking to populate a list view with this (already made this part). But I'm lost on how to go about this.

Any help? Thank you.

Popular Answer

Translating this code to VB.NET it's not difficult, you can do it the following :

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
  • NEED1

    var value = doc.DocumentNode.SelectSingleNode("//td[@class='trow1']/strong/a").Attributes["href"].Value;
    
  • NEED2

    var value = doc.DocumentNode.SelectSingleNode("//td[@class='trow1']/strong/a/span").InnerText;
    
  • NEED3

    var innerText = doc.DocumentNode.SelectSingleNode("//td[@class='trow1' and not(*)]").InnerText;
    
  • NEED4

    var innerText = doc.DocumentNode.SelectSingleNode("//td[@class='trow1']/strong[not(a)]").InnerText;
    

    I put above the single selection , if you want to select all the node in one you can use the method SelectNodes.

I hope this help you.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why