I found the HTML AgilityPack when attempting to do some screen scraping, but I'm having some problem finding out how to utilize it with VB.net.
If I know the content that is included in an HREF tag, the first thing I want to do is discover the URL string for it.
In order to store the data to a database, I also want to parse an HTML table by going over each row and extracting the information (after some basic analysis).
This is a nice place to start on SO: Utilizing the HTML Agility Pack
The xpath syntax for finding a particular HREF is "/a[@href='your url']", which means to "retrieve any A tag that has an HREF attribute equal to 'your url'."
If you merely have text, for instance, the html text "," it is possible to find an HREF.
This is how you would go about it if you type " and search for homepage.html.
string s = @"<a href=""homepage.html"">Cars</a>"; HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(s); HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']"); Console.WriteLine("href=" + node.GetAttributeValue("href", null));