HtmlAgilityPack - SelectNodes

html-agility-pack vb.net

Question

I'm trying to retrieve a <p class> element.

<div class="thread-plate__details">
    <h3 class="thread-plate__title">(S) HexHunter BOW</h3>
    <p class="thread-plate__summary">created by Aazoth</p>  <!-- (THIS ONE) -->
</div>

But with no luck.

The code I am using is below:

' the example url to scrape
            Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
            Dim source As String = GetSource(url)

            If source IsNot Nothing Then
                ' create a new html document and load the pages source
                Dim htmlDocument As New HtmlDocument
                htmlDocument.LoadHtml(source)

                ' Create a new collection of all href tags
                Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

                ' Using LINQ get all href values that start with http://
                ' of course there are others such as www.
                Dim links =
                    (
                        From node
                        In nodes
                        Let attribute = node.Attributes("class")
                        Where attribute.Value.StartsWith("created by ")
                        Select attribute.Value
                    )

                Me.ListBox1a.Items.AddRange(links.ToArray)
                Dim o, j As Long
                For o = 0 To ListBox1a.Items.Count - 1
                    For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
                        If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                            ListBox1a.Items.Remove(ListBox1a.Items((j)))
                        End If
                    Next
                Next
                For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
                    Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

                Next

                For Each s As String In ListBox1a.Items
                    Dim lvi As New NetSeal.NSListView
                    lvi.Text = s
                    NsListView1.Items.Add(lvi.Text)

                Next

It runs but I can't get the 'created by XXX' text. I've tried many ways but got no luck, an hand would be appreciated.

Thanks in advance everyone.

1
0
1/10/2018 4:43:57 PM

Accepted Answer

Looks like you are looking wrong string in the attribute.Value. What I see is that attribute.Value.StartsWith("created by ") must be changed to this one attribute.Value.StartsWith("thread-plate__summary").

And to grab inner content of node you have to do this: Select node.InnerText;

' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)

If source IsNot Nothing Then
    ' create a new html document and load the pages source
    Dim htmlDocument As New HtmlDocument
    htmlDocument.LoadHtml(source)

    ' Create a new collection of all href tags
    Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

    ' Using LINQ get all href values that start with http://
    ' of course there are others such as www.
    Dim links =
        (
            From node
            In nodes
            Let attribute = node.Attributes("class")
            Where attribute.Value.StartsWith("thread-plate__summary")
            Select node.InnerText
        )

    Me.ListBox1a.Items.AddRange(links.ToArray)
    Dim o, j As Long
    For o = 0 To ListBox1a.Items.Count - 1
        For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
            If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                ListBox1a.Items.Remove(ListBox1a.Items((j)))
            End If
        Next
    Next
    For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
        Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

    Next

    For Each s As String In ListBox1a.Items
        Dim lvi As New NetSeal.NSListView
        lvi.Text = s
        NsListView1.Items.Add(lvi.Text)

    Next

I hope this will work for you.

0
1/10/2018 4:50:44 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow