How to check if it is 404 error page(page does not exist) using HtmlAgilityPack

c# html-agility-pack

Question

I'm attempting to read urls and get photos from a website here. I must cease obtaining the photos from 404 error pages and omit the page if it is one. how to do that using HTMLAgilityPack? The code is below.

var document = new HtmlWeb().Load(completeurl);
var urls = document.DocumentNode.Descendants("img")
          .Select(e => e.GetAttributeValue("src", null))
          .Where(s => !String.IsNullOrEmpty(s)).ToList();
1
8
1/9/2016 2:22:49 PM

Accepted Answer

You will have to create aPostRequestHandler occasion on theHtmlWeb For instance, it will rise after each document download, and you'll have access to theHttpWebResponse object. It has a feature for theStatusCode .

 HtmlWeb web = new HtmlWeb();
 HttpStatusCode statusCode = HttpStatusCode.OK;
 web.PostRequestHandler += (request, response) =>
 {
     if (response != null)
     {
         statusCode = response.StatusCode;
     }
 }

 var doc = web.Load(completeUrl)
 if (statusCode == HttpStatusCode.OK)
 {
     // received a read document
 }

It's much easy when you look at the HtmlAgilityPack's code on GutHub.HtmlWeb possesses a qualityStatusCode which has the value "" set:

var web = new HtmlWeb();
var document = web.Load(completeurl);

if (web.StatusCode == HttpStatusCode.OK)
{
    var urls = document.DocumentNode.Descendants("img")
          .Select(e => e.GetAttributeValue("src", null))
          .Where(s => !String.IsNullOrEmpty(s)).ToList();
}

Update

The AgilityPack API has been updated. The method remains the same:

var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;

htmlWeb.PostResponse = (request, response) =>
{
    if (response != null)
    {
        lastStatusCode = response.StatusCode;
    }
};
9
5/18/2018 7:54:06 PM

Popular Answer

Know the version you're using!

I useHtmlAgilityPack v1.5.1 furthermore, there is nonePostRequestHandler event.

In thev1.5.1 One must usePostResponse field. See illustration underneath.

var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;

htmlWeb.PostResponse = (request, response) =>
{
    if (response != null)
    {
        lastStatusCode = response.StatusCode;
    }
};

Even if there aren't many distinctions, there are some.

Hope someone may find this to be a time saver.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow