I am trying to grab a html table from a remote page and display the contents of this table in a htmltable on my site. I am using htmlagility pack. So far here is my code:
Imports HtmlAgilityPack Partial Class ContentGrabExperiment Inherits System.Web.UI.Page Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load 'fetch the remote html page Dim web As New HtmlWeb() Dim html As HtmlAgilityPack.HtmlDocument = web.Load("http://www.thesite.com/page.html") 'Create table Dim outputTable As New HtmlTable Dim tableRow As New HtmlTableRow Dim tableCell As New HtmlTableCell 'Target the <table> tag For Each table As HtmlNode In html.DocumentNode.SelectNodes("//table") 'Target the <tr> tags within the table For Each row As HtmlNode In table.SelectNodes("//tr") 'Target the <td> tags within the <tr> tags For Each cell As HtmlNode In row.SelectNodes("//td") 'Set the value to that of the <td> tableCell.InnerText = cell.InnerHtml 'Add the cell to the row tableRow.Cells.Add(tableCell) Next 'Add row to the outputTable outputTable.Rows.Add(tableRow) Next Next 'Add the table to the page PlaceHolderTable.Controls.Add(outputTable) End Sub End Class
From this I was expecting to get the full table with innertext from the page, as a htmltable which I can then manipulate. What I get out of this code is:
<table> <tr> <td>&nbsp;</td> </tr> </table>
Please can someone point out where I am going wrong with my syntax. Any help much appreciated!
1) You only have one TableRow and one TableCell. You will need to create a new one for each row/cell. You can re-use the variables but you will need to "New" an object into them.
2) You might need to select
./td to get only rows and cells in the current table / row.