VB.net HTML AgilityPack is used to parse links and tables.

.net html-agility-pack vb.net

Question

I found the HTML AgilityPack when attempting to do some screen scraping, but I'm having some problem finding out how to utilize it with VB.net.

If I know the content that is included in an HREF tag, the first thing I want to do is discover the URL string for it.

In order to store the data to a database, I also want to parse an HTML table by going over each row and extracting the information (after some basic analysis).

1
1
4/18/2011 3:36:24 AM

Accepted Answer

This is a nice place to start on SO: Utilizing the HTML Agility Pack

View this as well: The example provided by HTMLAgilityPack for altering links is inoperable. How do I go about doing this?

This as well: identifying every A HREF Url in an HTML page (even in malformed HTML)

The xpath syntax for finding a particular HREF is "/a[@href='your url']", which means to "retrieve any A tag that has an HREF attribute equal to 'your url'."

EDIT:

If you merely have text, for instance, the html text "," it is possible to find an HREF.<a href="homepage.html">Cars</a> This is how you would go about it if you type " and search for homepage.html.

        string s = @"<a href=""homepage.html"">Cars</a>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(s);

        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']");
        Console.WriteLine("href=" + node.GetAttributeValue("href", null));
1
5/23/2017 12:18:59 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow