Obtenez un texte interne entre deux balises et la sortie sous deux étiquettes - VB.NET - HtmlAgilityPack

html html-agility-pack innertext vb.net

Question

J'ai essayé de chercher des exemples et beaucoup mais rien ne semble fonctionner. donc j'utilise HtmlAgilityPack et je veux obtenir le texte interne entre deux balises spécifiques.

Exemple:

<br>Terms of Service<br></br>Developers<br>

Je veux obtenir le InnerText où le premier <br> et <br> dans label1 et le second </br> et <br> dans label2

qui sera comme

Label1.text = "Conditions d'utilisation"
Label2.text = "Développeurs"

Comment puis-je atteindre / obtenir / obtenir cela? Ps; Je ne connais pas très bien HtmlAgilityPack, un code montrant comment faire cela fera mieux. :-)

Merci

Réponse acceptée

c'est un peu sale, mais devrait fonctionner.

Imports System.Text.RegularExpressions

  Dim mystring As String = "<br>Terms of Service<br></br>Developers<br>"

    Dim pattern1 As String = "(?<=<br>)(.*?)(?=<br>)"
    Dim pattern2 As String = "(?<=</br>)(.*)(?=<br>)"

    Dim m1 As MatchCollection = Regex.Matches(mystring, pattern1)
    Dim m2 As MatchCollection = Regex.Matches(mystring, pattern2)
    MsgBox(m1(0).ToString)
    MsgBox(m2(0).ToString)

Réponse populaire

La réponse courte est que HAP n'est pas bien adapté pour accomplir votre tâche. Mes notes ci-dessous:

Imports HtmlAgilityPack

Public Class Form1
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim mystring As String = "<BR>Terms of Service<BR></BR>Developers<BR>"
        Dim myDoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
        myDoc.LoadHtml(mystring)
        ' here we notice HAP immediately discards the junk tag </br>
        MsgBox(myDoc.DocumentNode.OuterHtml)

        ' Below we notice that HAP did not close the BR tag because it only 
        ' attempts to close 
        ' certain nested tags associated with tables ( th, tr, td) and lists 
        ' ( li ). 
        ' if this was a supported tag that HAP could fix, the fixed output 
        ' would be as follows: 
        ' <br>Terms of Service<br></br>Developers<br></br></br>
        ' this string would be parsed as if the last tag closes the first 
        ' and each set of 
        ' inner tags close themselves without any text between them. 
        ' This means even if you changed BR to TD, or some other tag HAP 
        ' fixes nesting on, it 
        ' still would not help to parse this correctly.  
        ' Also HAP does not appear to support XHTML in this .net 2.0 version.  

        myDoc.OptionFixNestedTags = True
        MsgBox(myDoc.DocumentNode.OuterHtml)

        ' here we put the BR tag into a collection.  as it iterates through 
        ' the tags we notice there is no inner text on the BR tag, presumably 
        ' because of two reasons.  
        ' 1. HAP will not close a BR.  
        ' 2. It does not fix your broken nested tags as you expect or required.  

        Dim myBR As HtmlNodeCollection = myDoc.DocumentNode.SelectNodes("//BR")
        If Not myBR Is Nothing Then
            For Each br In myBR
                MsgBox(br.InnerText)
            Next
        End If
    End Sub

End Class


Related

Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow