Extract emails from HTML by using HtmlAgilityPack

c# html-agility-pack selectnodes

Question

How can I use HTMLAgilityPack to extract the email and website address from this HTML code?

<a class="email" href="mailto:babaie@irandoc.ac.ir">

<a class="" href="http://www.babaie.ir" target="_blank">www.babaie.ir</a>

This code didn't work for email when I tried it:

doc.DocumentNode.SelectNodes("//a[@href= ' ' ]");
1
1
10/31/2014 9:49:27 AM

Popular Answer

receiving email

var a = doc.DocumentNode.SelectSingleNode("//a[@class='email']");
if (a != null)
{
    string href = a.Attributes["href"].Value; // TODO: Check if href exists
    string email = href.Replace("mailto:", "");
}

The following code will return all anchor tags as it's unclear how your website URL differentiates from any other anchor tags (there is no special class or id here).href links in your HTML from any anchors:

var urls = doc.DocumentNode.SelectNodes("//a[@href]")
              .Select(a => a.Attributes["href"].Value)
              .Where(href => !href.StartsWith("mailto:")) // skip emails
              .ToList();
5
11/7/2013 11:37:52 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow