To select all image> elements, use HTMLAgilityPack SelectNodes.

c# html html-agility-pack regex xpath

Question

For a game that involves picture searches, I'm creating a C# project that is essentially an image screen scraper. I'm attempting to use HTMLAgilityPack to pick every picture element and add it to an HTMLNodeCollection in the manner described below:

//set up for checking autos

HtmlNodeCollection imgs = new HtmlNodeCollection(doc.DocumentNode.ParentNode);
imgs = doc.DocumentNode.SelectNodes("//img");

foreach (HtmlNode img in imgs)
{
    HtmlAttribute src = img.Attributes["@src"];
    urls.Add(src.Value);
}

urls is a public List collection, so take note:

public List<string> urls = new List<string>();

An exception is being thrown by my foreach loop:

Object reference not set to an instance of an object.

Upon checking the cars, imgs is indeed null. Is there a more effective approach for me to identify the cause of this issue? I'm not sure whether it's my Xpath or anything else.

I had previously gotten everything to work, but I screwed up my file versions and lost my effort, which is the most annoying part. Derp.

1
12
10/25/2011 12:31:46 AM

Accepted Answer

There may be a mistake in the next line:

HtmlAttribute src = img.Attributes["@src"];

(Note the @ position) I was able to make this work for me:

HtmlAttribute src = img.Attributes[@"src"];
12
2/1/2012 12:43:17 AM

Popular Answer

It functions for me. Your document may not have been loaded properly, which is why the xpath returns no matches.

HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml("<html><head></head><body><div><img /><div><img /><img/></div></div><img/></body></html>");

var nodes = htmlDocument.DocumentNode.SelectNodes("//img");
// 4 nodes found
foreach (var node in nodes)
{
    // do stuff
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow