Extract emails from HTML by using HtmlAgilityPack

c# html-agility-pack selectnodes

Question

How can I extract the email and website address using HtmlAgilityPack in this HTML code :

<a class="email" href="mailto:babaie@irandoc.ac.ir">

<a class="" href="http://www.babaie.ir" target="_blank">www.babaie.ir</a>

I tried this code but it doesn't work for email :

doc.DocumentNode.SelectNodes("//a[@href= ' ' ]");

Popular Answer

Getting email:

var a = doc.DocumentNode.SelectSingleNode("//a[@class='email']");
if (a != null)
{
    string href = a.Attributes["href"].Value; // TODO: Check if href exists
    string email = href.Replace("mailto:", "");
}

It's not clear how your website address differs from any other anchor tags (no specific class or id here), so following code will return all href links from any anchors in your html:

var urls = doc.DocumentNode.SelectNodes("//a[@href]")
              .Select(a => a.Attributes["href"].Value)
              .Where(href => !href.StartsWith("mailto:")) // skip emails
              .ToList();



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why