Get links from webpage to textbox (vb.net + html agility pack)

html-agility-pack screen-scraping vb.net

Question

Im making a vb.net app and im using htmlagilitypack. I need hap to get the profile links from yellowpages.ca

Here is an example of the html:

<a href="/bus/Ontario/Brampton/A-Safe-Self-Storage/17142.html?what=af&amp;where=Ontario&amp;le=1238793c7aa%7Ccf8042ceaa%7C2ae32e5a2a" onmousedown="utag.link({link_name:'busname', link_attr1:'in_listing_left', listing_link:'18063_lpp|busname_af', headdir_link:'01252110|092202,00891210|092202,00184200|092202', position_address:'l_y', position_number:'l_6'});" id="mapLink5" title="See detailed information for A Safe Self Storage"><span class="listingTitle">A Safe Self Storage</span></a>

This is the link, "/bus/Ontario/Brampton/A-Safe-Self-Storage/17142.html?what=af&where=Ontario&le=1238793c7aa%7Ccf8042ceaa%7C2ae32e5a2a".

A little help would be appreciated.

Accepted Answer

You need to examine the documentation.

Here is a sample of reading an HTML file stored on the PC

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Use a converter to convert to VB.NET. This line is the key

HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])

Again, you need to read the documentation and understand how to parse the HTML DOM.

Here is an example of loading and parsing a web page. You'll need to use the "HttpWebRequest" to stream the webpage form a webserver.

Additional reading here



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why