HTML Agility Pack-Get always the first element details

c# html-agility-pack

Question

I'm using HTML Agility Pack to fetch element's details from this url:Link

I'm using this code in C# (windows Form Application):

var webGet = new HtmlWeb();
    doc = webGet.Load("http://www.trendyol.com/Butik/Liste/Kadin");

    HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik')]");
    richTextBox1.Text = butiks.Count().ToString();

    if (butiks != null)
    {
        foreach (HtmlNode element in butiks)
        {
            var butikUrl = element.SelectSingleNode("//div[@class='butik-large-image']/a").GetAttributeValue("href", null);
            var butikTitle = element.SelectSingleNode("//div[@class='butik-large-image']/a").GetAttributeValue("title", null);
            var butikImg = element.SelectSingleNode("//div[@class='butik-large-image']//a/img").GetAttributeValue("src", null);
            var butikEndTime = element.SelectSingleNode("//div[@class='butik-name']/div[@class='butikTimeLine']/a/div[@class='timelineMain']/h1").GetAttributeValue("enddate", null);
            dataGridView1.Rows.Add("", butikUrl, butikTitle, butikImg, butikEndTime);
        }

    }
    else
    {
        MessageBox.Show("Null Obeject...!");
    }

This code always return me the element details. Can you help?

I also have used the following code but the following error occurs:

var butikUrl = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("href", null);
                        var butikTitle = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("title", null);
                        var butikImg = element.SelectSingleNode(".//div[@class='butik-large-image']//a/img").GetAttributeValue("src", null);
                        var butikEndTime = element.SelectSingleNode(".//div[@class='butik-name']/div[@class='butikTimeLine']/a/div[@class='timelineMain']/h1").GetAttributeValue("enddate", null);

This error is for var butikUrl = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("href", null);

Error: Additional information: Object reference not set to an instance of an object.

Accepted Answer

The XPath predicate to populate butiks variable seems too general. contains(@class,'butik') expression will also match butik-large-image, butik-name, etc. which don't have certain descendant element you're trying to access in the foreach loop body, that's possibly the cause of the exception. Try to use a more specific predicate, for example by matching div having class exactly equals 'butik large' (XPath tested in Firefox's FirePath) :

doc.DocumentNode.SelectNodes("//div[@class='butik large']");

Popular Answer

Change

HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik')]");

To

HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik-large-image')]");

This should return the 20 stacked advertisement elements.

You can then grab another NodeCollection of the other advertisements with

HtmlNodeCollection butiks2 = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik small left')]");

I have some HtmlAgilityPack web scrapping code at home, that I can shoot your way they may help as well.

Edit: You can join the two lists with LINQ

butiks.Union(butiks2);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why