Inner node data with HtmlAgilityPack C#

.net c# html html-agility-pack parsing


I am using HtmlAgilityPack to read data/string from a webpage.

My html is here in fiddle

Here is my code

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
if (htmlDoc.DocumentNode != null){
HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
if (bodyNode != null)
{//what to do here to get title and href?
var inputs = from input in htmlDoc.DocumentNode.Descendants("div")
                     where input.Attributes["class"].Value == "results-data-price-btn"
                     select input;


Please guid me how to get div values via classes

6/20/2013 11:48:48 AM

Accepted Answer

Note: the below is untested, I've just quickly looked at the HTML of the page and tried to understand how it 'fits' together.

Each car 'result' has a div with the class search-results-box. So....

var rootNode = htmlDoc.DocumentNode;
var allCarResults = rootNode.SelectNodes("//div[normalize-space(@class)='search-results-box']");
foreach (var carResult in allCarResults)


You have each 'car result' (as in, each item is now the entire section that represents one of the dig deeper....

Within each of these, the data of the car is within another div, with the class

var dataNode = carResult.SelectSingleNode(".//div[@class='search-results-data']");

Within this, you will now dig even deeper. The title of the car is within another element, specifically within a child h2...

var carNameNode = dataNode.SelectSingleNode(".//h2/a");
string carName = carNameNode.InnerText.Trim();

The price of the car is most difficult thanks to the horrible markup in the HTML.

It sits within a font element which is inside another div...

var carPriceNode = dataNode.SelectSingleNode(".//div[@class='results-data-price-btn']/font");
string carPrice = carPriceNode.InnerText.Trim(); // this will give you AED 24,500. Perform some logic to split that up so you just have the number...a

The problem is that the price is stuck together as "AED 24,500" in one element. Therefore you can easily get the element, but if you want just the number, that's logic you'll need to figure out for yourself to do.

The image itself, is fine. That's a level up in the markup, back up as a child under the carResult, so up we go.....:

var carImageNode = carResult.SelectSingleNode(".//div[@class='search-results-img']/descendant::img");
string carImageSource = carImageNode.GetAttributeValue("src", string.Empty);


All of the 'More Details about this used car' information is stuffed into one place, so the below will work for your example but may not work for all of them:

var descriptionNode = rootNode.SelectSingleNode("//div[@id='description']");

var entireDescription = descriptionNode.InnerText.Trim();

var splitUpDescriptionParts =
                "More Details about this Used Car:", "Body Condition:", "Mechanical Condition:", "Doors:", "Cylinders:", "Body Style:",
                "Drive Type:", "Warrenty:", "Description:"
        StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim()).Where(s => !string.IsNullOrWhiteSpace(s));

string bodyCondition = splitUp.First();
string mechancialCondition = splitUp.ElementAt(1);
string amountOfDoors = splitUp.ElementAt(2);
string amountOfCylinders = splitUp.ElementAt(3);
string bodyStyle = splitUp.ElementAt(4);
string driveType = splitUp.ElementAt(5);
string warranty = splitUp.ElementAt(6);
string description = splitUp.Last();
6/24/2013 10:04:17 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow