Extract emails from HTML by using HtmlAgilityPack

c# html-agility-pack selectnodes

Question

How can I extract the email and website address using HtmlAgilityPack in this HTML code :

<a class="email" href="mailto:babaie@irandoc.ac.ir">

<a class="" href="http://www.babaie.ir" target="_blank">www.babaie.ir</a>

I tried this code but it doesn't work for email :

doc.DocumentNode.SelectNodes("//a[@href= ' ' ]");
1
1
10/31/2014 9:49:27 AM

Popular Answer

Getting email:

var a = doc.DocumentNode.SelectSingleNode("//a[@class='email']");
if (a != null)
{
    string href = a.Attributes["href"].Value; // TODO: Check if href exists
    string email = href.Replace("mailto:", "");
}

It's not clear how your website address differs from any other anchor tags (no specific class or id here), so following code will return all href links from any anchors in your html:

var urls = doc.DocumentNode.SelectNodes("//a[@href]")
              .Select(a => a.Attributes["href"].Value)
              .Where(href => !href.StartsWith("mailto:")) // skip emails
              .ToList();
5
11/7/2013 11:37:52 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow