How do I remove specific elements from HTML with HTML Agility Pack for ASP.NET (vb)

asp.net html-agility-pack vb.net

Question

There seems to be no documentation on the codeplex page and for some reason intellisense doesn't show me available methods or anything at all for htmlagilitypack (for example when I type MyHtmlDocument.DocumentNode. - there is no intellisense to tell me what I can do next)

I need to know how to remove ALL < a > tags and their content from the body of the HTML document I cannot just use Node.InnerText on the Body because that still returns content from A tags.

Here is example HTML

<html>
    <body>
        I was born in <a name=BC>Toronto</a> and now I live in barrie
    </body>
</html>

I need to return

I was born in and now I live in barrie

Thanks, I appreciate the help!

Thomas

Accepted Answer

This gets you the result you require. This uses Recursive method to drill down all your html nodes and you can simply remove more nodes by adding a new if statment.

Public Sub Test()
    Dim document = New HtmlDocument() With { _
        Key .OptionOutputAsXml = True _
    }
    document.LoadHtml("<html><body>I was born in <a name=BC>Toronto</a> and now I live in barrie</body></html>")

    For i As var = 0 To document.DocumentNode.ChildNodes.Count - 1
        RecursiveMethod(document.DocumentNode.ChildNodes(i))
    Next

    Console.Out.WriteLine(document.DocumentNode.InnerHtml.Replace("  ", " "))
End Sub

Public Sub RecursiveMethod(child As HtmlNode)
    For x As var = 0 To child.ChildNodes.Count - 1
        Dim node = child.ChildNodes(x)
        If node.Name = "a" Then
            node.RemoveAll() //removes all the child nodes of "a"
            node.Remove()    //removes the actual "a" node
        Else
            If node.HasChildNodes Then
                RecursiveMethod(node)
            End If
        End If
    Next
End Sub

Popular Answer

Something along the lines of (sorry my code is C# but I hope it will help nonetheless)

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml("some html markup here");

HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@name]");

foreach(HtmlNode link in links)
{
    link.Remove();
}

//then one of the many doc.Save(...) overrides to actually get the result of the operation.


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why