將表從html提取到asp.net vb中的htmltable(htmlagilitypack)

asp.net html-agility-pack html-table vb.net web-scraping

我試圖從遠程頁面獲取一個html表,並在我的網站上的htmltable中顯示該表的內容。我正在使用htmlagility pack。到目前為止,這是我的代碼:

Imports HtmlAgilityPack
Partial Class ContentGrabExperiment
    Inherits System.Web.UI.Page
    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        'fetch the remote html page
        Dim web As New HtmlWeb()
        Dim html As HtmlAgilityPack.HtmlDocument = web.Load("http://www.thesite.com/page.html")

        'Create table
        Dim outputTable As New HtmlTable
        Dim tableRow As New HtmlTableRow
        Dim tableCell As New HtmlTableCell


        'Target the <table> tag 
        For Each table As HtmlNode In html.DocumentNode.SelectNodes("//table")
            'Target the <tr> tags within the table
            For Each row As HtmlNode In table.SelectNodes("//tr")
                'Target the <td> tags within the <tr> tags
                For Each cell As HtmlNode In row.SelectNodes("//td")
                    'Set the value to that of the <td>
                    tableCell.InnerText = cell.InnerHtml
                    'Add the cell to the row
                    tableRow.Cells.Add(tableCell)
                Next
                'Add row to the outputTable 
                outputTable.Rows.Add(tableRow)
            Next
        Next
        'Add the table to the page
        PlaceHolderTable.Controls.Add(outputTable)
    End Sub
End Class

從這一點開始,我期待從頁面獲得帶有innertext的完整表格,作為一個htmltable,然後我可以操作。我從這段代碼中得到的是:

Imports HtmlAgilityPack
Partial Class ContentGrabExperiment
    Inherits System.Web.UI.Page
    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        'fetch the remote html page
        Dim web As New HtmlWeb()
        Dim html As HtmlAgilityPack.HtmlDocument = web.Load("http://www.thesite.com/page.html")

        'Create table
        Dim outputTable As New HtmlTable
        Dim tableRow As New HtmlTableRow
        Dim tableCell As New HtmlTableCell


        'Target the <table> tag 
        For Each table As HtmlNode In html.DocumentNode.SelectNodes("//table")
            'Target the <tr> tags within the table
            For Each row As HtmlNode In table.SelectNodes("//tr")
                'Target the <td> tags within the <tr> tags
                For Each cell As HtmlNode In row.SelectNodes("//td")
                    'Set the value to that of the <td>
                    tableCell.InnerText = cell.InnerHtml
                    'Add the cell to the row
                    tableRow.Cells.Add(tableCell)
                Next
                'Add row to the outputTable 
                outputTable.Rows.Add(tableRow)
            Next
        Next
        'Add the table to the page
        PlaceHolderTable.Controls.Add(outputTable)
    End Sub
End Class

請有人指出我的語法出錯了。任何幫助非常感謝!

熱門答案

1)您只有一個TableRow和一個TableCell。您需要為每個行/單元格創建一個新的。您可以重複使用變量,但需要將一個對象“新建”到它們中。

2)您可能需要選擇./tr./td以僅獲取當前表/行中的行和單元格。




許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因