My app is using htmlagility pack. As of right now I can get all the input elements on a form. The problem is that I am getting ALL the input elements by ID. I am trying to narrow it down to only give me input elements of a form by ID that contain exact inner text labels before each input element.
Example:
<label for="email">Email Address:</label>
<input type="text" class="textbox" name="email" id="email" maxlength="50" value="" dir="ltr" tabindex="1"
I am trying to get the input that has a proceeding label with the inner text of "Email Address"
How would I word this?
Here is my app that grabs ALL input elements by ID.
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim doc As HtmlDocument
Dim web As New HtmlWeb
doc = web.Load("http://shaggybevo.com/board/register.php")
Dim docNode As HtmlNode = doc.DocumentNode
Dim nodes As HtmlNodeCollection = docNode.SelectNodes("//input")
'SelectNodes takes a XPath expression
For Each node As HtmlNode In nodes
'Get all input elements by id
Dim id As String = node.GetAttributeValue("value", "id")
'print all input elements by id to form2 richtextbox
Form2.RichTextBox1.Text = Form2.RichTextBox1.Text & Environment.NewLine & id.ToString & name.ToString()
Form2.Show()
Next
End Sub
Thanks guys....I have to say I've been studying VB.NET for a while and to date this forum has been awesome...glad I found it..
The basic concept here is to get the labels whose for
attribute matches the id of the associated input
.
So, we cycle through the labels first and record the label's text in a dictionary that is keyed by the for
value, then we cycle through the inputs
and if the id of the input is in the dictionary, we retrieve the value from the dictionary (which is the label text) and show it.
Note that I have also modified how the data is collected to be more efficient (almost any time you concatenate strings, you should use stringbuilder).
Here's the rewritten code:
Dim web As HtmlAgilityPack.HtmlWeb = New HtmlWeb()
Dim doc As HtmlAgilityPack.HtmlDocument = web.Load("http://shaggybevo.com/board/register.php")
Dim nodes As HtmlNodeCollection
' Keeps track of the labels by the associated control id
Dim labelText As New System.Collections.Generic.Dictionary(Of String, String)
' First, get the labels
nodes = doc.DocumentNode.SelectNodes("//label")
If nodes IsNot Nothing Then
For Each node In nodes
If node.Attributes.Contains("for") Then
Dim sFor As String
' Extract the for value
sFor = node.Attributes("for").Value
' If it does not exist in our dictionary, add it
If Not labelText.ContainsKey(sFor) Then
labelText.Add(sFor, node.InnerText)
End If
End If
Next
End If
nodes = doc.DocumentNode.SelectNodes("//input")
Dim sbText As New System.Text.StringBuilder(500)
If nodes IsNot Nothing Then
For Each node In nodes
' See if this input is associated with a label
If labelText.ContainsKey(node.Id) Then
' If it is, add it to our collected information
sbText.Append("Label = ").Append(labelText(node.Id))
sbText.Append(", Id = ").Append(node.Id)
sbText.AppendLine()
End If
Next
End If
Form2.RichTextBox1.Text = sbText.ToString
Form2.Show()