parsing links and tables using VB.net HTML AgilityPack

.net html-agility-pack vb.net

Question

I'm trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.

The first thing I want to do is find the URL string for an HREF tag if I know the text that is enclosed in the HREF.

The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).

Accepted Answer

Here is a good starting link here on SO: How to use HTML Agility pack

See also this: HtmlAgilityPack example for changing links doesn't work. How do I accomplish this?

And this: Finding all the A HREF Urls in an HTML document (even in malformed HTML)

To find a specific HREF, the xpath syntax would be "//a[@href='your url']", meaning: "get any A tag that has an HREF attribute equal to 'your url'.

EDIT:

To find an HREF if you only know the text, for example if you have the html text '<a href="homepage.html">Cars</a>' and look for homepage.html, then this is how you would do it.

        string s = @"<a href=""homepage.html"">Cars</a>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(s);

        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']");
        Console.WriteLine("href=" + node.GetAttributeValue("href", null));



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why