I'm trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.
The first thing I want to do is find the URL string for an HREF tag if I know the text that is enclosed in the HREF.
The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).
Here is a good starting link here on SO: How to use HTML Agility pack
To find a specific HREF, the xpath syntax would be "//a[@href='your url']", meaning: "get any A tag that has an HREF attribute equal to 'your url'.
To find an HREF if you only know the text, for example if you have the html text '
<a href="homepage.html">Cars</a>' and look for homepage.html, then this is how you would do it.
string s = @"<a href=""homepage.html"">Cars</a>"; HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(s); HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']"); Console.WriteLine("href=" + node.GetAttributeValue("href", null));