How to extract meta tags of a series of URLs without downloading whole html in c#

asp.net c# html html-agility-pack metadata

Question

I want to extract title , description and keywords of a seris of URLs
I have this code

 WebClient x = new WebClient();
 string  pageSource = (x.DownloadString(url));     
 query.title = Regex.Match(pageSource, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value;

But I do not want to download whole page because It is so time consuming for a series of URLs. Is there any way to get get these information without downloading whole page?
I should mention that I get these URLs in google search result page buy sending query to google.

Popular Answer

You can request and download partial result using HttpClient by specifying range header. You can define the buffer length you want to download and read:

    static void Main()
    {
        Test().GetAwaiter().GetResult();
    }

    private static async Task Test()
    {
        const string url = "http://google.com";
        const int bytesToRead = 2000;

        using (var httpclient = new HttpClient())
        {
            httpclient.DefaultRequestHeaders.Range = new RangeHeaderValue(0, bytesToRead);

            var response = await httpclient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);

            using (var stream = await response.Content.ReadAsStreamAsync())
            {
                var buffer = new byte[bytesToRead];
                stream.Read(buffer, 0, buffer.Length);

                var partialHtml = Encoding.UTF8.GetString(buffer);
                //extract required info from partial html
            }
        }
    }

Same result could be achieved using "old" WebClient




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why