I'm trying to retrieve a <p class>
element.
<div class="thread-plate__details">
<h3 class="thread-plate__title">(S) HexHunter BOW</h3>
<p class="thread-plate__summary">created by Aazoth</p> <!-- (THIS ONE) -->
</div>
But with no luck.
The code I am using is below:
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("created by ")
Select attribute.Value
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
It runs but I can't get the 'created by XXX' text. I've tried many ways but got no luck, an hand would be appreciated.
Thanks in advance everyone.
Looks like you are looking wrong string in the attribute.Value
. What I see is that attribute.Value.StartsWith("created by ")
must be changed to this one attribute.Value.StartsWith("thread-plate__summary")
.
And to grab inner content of node you have to do this: Select node.InnerText
;
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("thread-plate__summary")
Select node.InnerText
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
I hope this will work for you.