I'm using HTML Agility Pack to fetch element's details from this url:Link
I'm using this code in C# (windows Form Application):
var webGet = new HtmlWeb();
doc = webGet.Load("http://www.trendyol.com/Butik/Liste/Kadin");
HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik')]");
richTextBox1.Text = butiks.Count().ToString();
if (butiks != null)
{
foreach (HtmlNode element in butiks)
{
var butikUrl = element.SelectSingleNode("//div[@class='butik-large-image']/a").GetAttributeValue("href", null);
var butikTitle = element.SelectSingleNode("//div[@class='butik-large-image']/a").GetAttributeValue("title", null);
var butikImg = element.SelectSingleNode("//div[@class='butik-large-image']//a/img").GetAttributeValue("src", null);
var butikEndTime = element.SelectSingleNode("//div[@class='butik-name']/div[@class='butikTimeLine']/a/div[@class='timelineMain']/h1").GetAttributeValue("enddate", null);
dataGridView1.Rows.Add("", butikUrl, butikTitle, butikImg, butikEndTime);
}
}
else
{
MessageBox.Show("Null Obeject...!");
}
This code always return me the element details. Can you help?
I also have used the following code but the following error occurs:
var butikUrl = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("href", null);
var butikTitle = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("title", null);
var butikImg = element.SelectSingleNode(".//div[@class='butik-large-image']//a/img").GetAttributeValue("src", null);
var butikEndTime = element.SelectSingleNode(".//div[@class='butik-name']/div[@class='butikTimeLine']/a/div[@class='timelineMain']/h1").GetAttributeValue("enddate", null);
This error is for var butikUrl = element.SelectSingleNode(".//div[@class='butik-large-image']/a").GetAttributeValue("href", null);
Error: Additional information: Object reference not set to an instance of an object.
The XPath predicate to populate butiks
variable seems too general. contains(@class,'butik')
expression will also match butik-large-image
, butik-name
, etc. which don't have certain descendant element you're trying to access in the foreach
loop body, that's possibly the cause of the exception. Try to use a more specific predicate, for example by matching div
having class
exactly equals 'butik large'
(XPath tested in Firefox's FirePath) :
doc.DocumentNode.SelectNodes("//div[@class='butik large']");
Change
HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik')]");
To
HtmlNodeCollection butiks = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik-large-image')]");
This should return the 20 stacked advertisement elements.
You can then grab another NodeCollection
of the other advertisements with
HtmlNodeCollection butiks2 = doc.DocumentNode.SelectNodes("//div[contains(@class,'butik small left')]");
I have some HtmlAgilityPack web scrapping code at home, that I can shoot your way they may help as well.
Edit: You can join the two lists with LINQ
butiks.Union(butiks2);