How do I get the inner text of a label before an input element?

html-agility-pack vb.net

Question

My app is using htmlagility pack. As of right now I can get all the input elements on a form. The problem is that I am getting ALL the input elements by ID. I am trying to narrow it down to only give me input elements of a form by ID that contain exact inner text labels before each input element.

Example:

<label for="email">Email Address:</label>
<input type="text" class="textbox" name="email" id="email" maxlength="50" value="" dir="ltr" tabindex="1" 

I am trying to get the input that has a proceeding label with the inner text of "Email Address"

How would I word this?

Here is my app that grabs ALL input elements by ID.

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    Dim doc As HtmlDocument
    Dim web As New HtmlWeb
    doc = web.Load("http://shaggybevo.com/board/register.php")
    Dim docNode As HtmlNode = doc.DocumentNode
    Dim nodes As HtmlNodeCollection = docNode.SelectNodes("//input")
    'SelectNodes takes a XPath expression
    For Each node As HtmlNode In nodes
        'Get all input elements by id
        Dim id As String = node.GetAttributeValue("value", "id")

        'print all input elements by id to form2 richtextbox
        Form2.RichTextBox1.Text = Form2.RichTextBox1.Text & Environment.NewLine & id.ToString & name.ToString()
        Form2.Show()

    Next

End Sub

Thanks guys....I have to say I've been studying VB.NET for a while and to date this forum has been awesome...glad I found it..

Accepted Answer

The basic concept here is to get the labels whose for attribute matches the id of the associated input.

So, we cycle through the labels first and record the label's text in a dictionary that is keyed by the for value, then we cycle through the inputs and if the id of the input is in the dictionary, we retrieve the value from the dictionary (which is the label text) and show it.

Note that I have also modified how the data is collected to be more efficient (almost any time you concatenate strings, you should use stringbuilder).

Here's the rewritten code:

    Dim web As HtmlAgilityPack.HtmlWeb = New HtmlWeb()
    Dim doc As HtmlAgilityPack.HtmlDocument = web.Load("http://shaggybevo.com/board/register.php")
    Dim nodes As HtmlNodeCollection

    ' Keeps track of the labels by the associated control id
    Dim labelText As New System.Collections.Generic.Dictionary(Of String, String)

    ' First, get the labels
    nodes = doc.DocumentNode.SelectNodes("//label")

    If nodes IsNot Nothing Then
        For Each node In nodes
            If node.Attributes.Contains("for") Then
                Dim sFor As String

                ' Extract the for value
                sFor = node.Attributes("for").Value

                ' If it does not exist in our dictionary, add it
                If Not labelText.ContainsKey(sFor) Then
                    labelText.Add(sFor, node.InnerText)
                End If
            End If
        Next
    End If

    nodes = doc.DocumentNode.SelectNodes("//input")

    Dim sbText As New System.Text.StringBuilder(500)

    If nodes IsNot Nothing Then
        For Each node In nodes
            ' See if this input is associated with a label
            If labelText.ContainsKey(node.Id) Then
                ' If it is, add it to our collected information
                sbText.Append("Label = ").Append(labelText(node.Id))
                sbText.Append(", Id = ").Append(node.Id)

                sbText.AppendLine()
            End If
        Next
    End If

    Form2.RichTextBox1.Text = sbText.ToString
    Form2.Show()



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why