Parse International Phone numbers from web pages

c# html-agility-pack phone-number regex


I am using HtmlAgilityPack to parse the webpages. once the document is loaded, I want to extract the possible phone numbers from HTML. Currently, I am using some regex for this purpose. I have following piece of code that checks for the match of phone numbers in webpage

    private static string phoneReg =
            private static Regex phoneRegex = new Regex(phoneReg, RegexOptions.IgnoreCase);
var phoneMatches = phoneRegex.Matches(doci.DocumentNode.InnerText);

where doci is HtmlDocument abstraction from html agility pack. The problem is that it fails to match some phone numbers like 08450 211 211 and +44 (0) 1246 733 000.

Is there a generic regex expression that is most suitable when crawling websites and it allows the matching of most forms of international phone numbers?

1/31/2018 7:17:16 PM

Accepted Answer

You cannot match those phone numbers (08450 211 211 and +44 (0) 1246 733 000) because your regex simply doesn't match them.

The first thing you have to do when writing a regular expression is to identify the pattern you want to match.

So, my suggestion is to write down a list of the different phone number formats, update your question, and then we will be able to help you. Otherwise I will always create a new phone number that your regex might not match, or it will just match more than whan you want.

Here is a regex that will match the above phone numbers:



According to your comment, I would just use this regex, and then remove the ones that are not phone numbers:

(?:\+\d+\s+\(\d+\)\s+)?[\d -]+
3/2/2013 8:12:10 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow