HtmlAgilityPack - SelectNodes

html-agility-pack vb.net

我正在嘗試檢索<p class>元素。

<div class="thread-plate__details">
    <h3 class="thread-plate__title">(S) HexHunter BOW</h3>
    <p class="thread-plate__summary">created by Aazoth</p>  <!-- (THIS ONE) -->
</div>

但沒有運氣。

我使用的代碼如下:

' the example url to scrape
            Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
            Dim source As String = GetSource(url)

            If source IsNot Nothing Then
                ' create a new html document and load the pages source
                Dim htmlDocument As New HtmlDocument
                htmlDocument.LoadHtml(source)

                ' Create a new collection of all href tags
                Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

                ' Using LINQ get all href values that start with http://
                ' of course there are others such as www.
                Dim links =
                    (
                        From node
                        In nodes
                        Let attribute = node.Attributes("class")
                        Where attribute.Value.StartsWith("created by ")
                        Select attribute.Value
                    )

                Me.ListBox1a.Items.AddRange(links.ToArray)
                Dim o, j As Long
                For o = 0 To ListBox1a.Items.Count - 1
                    For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
                        If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                            ListBox1a.Items.Remove(ListBox1a.Items((j)))
                        End If
                    Next
                Next
                For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
                    Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

                Next

                For Each s As String In ListBox1a.Items
                    Dim lvi As New NetSeal.NSListView
                    lvi.Text = s
                    NsListView1.Items.Add(lvi.Text)

                Next

它運行但我不能得到'由XXX創建'文本。我嘗試了很多方法,但沒有運氣,一手會很感激。

在此先感謝大家。

一般承認的答案

看起來你在attribute.Value看錯了字符串。我看到的是attribute.Value.StartsWith("created by ")必須更改為這個attribute.Value.StartsWith("thread-plate__summary")

要獲取節點的內部內容,您必須執行以下操作: Select node.InnerText ;

' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)

If source IsNot Nothing Then
    ' create a new html document and load the pages source
    Dim htmlDocument As New HtmlDocument
    htmlDocument.LoadHtml(source)

    ' Create a new collection of all href tags
    Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

    ' Using LINQ get all href values that start with http://
    ' of course there are others such as www.
    Dim links =
        (
            From node
            In nodes
            Let attribute = node.Attributes("class")
            Where attribute.Value.StartsWith("thread-plate__summary")
            Select node.InnerText
        )

    Me.ListBox1a.Items.AddRange(links.ToArray)
    Dim o, j As Long
    For o = 0 To ListBox1a.Items.Count - 1
        For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
            If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                ListBox1a.Items.Remove(ListBox1a.Items((j)))
            End If
        Next
    Next
    For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
        Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

    Next

    For Each s As String In ListBox1a.Items
        Dim lvi As New NetSeal.NSListView
        lvi.Text = s
        NsListView1.Items.Add(lvi.Text)

    Next

我希望這對你有用。



許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因