Inner node data with HtmlAgilityPack C#

.net c# html html-agility-pack parsing

Question

I am using HtmlAgilityPack to read data/string from a webpage.

My html is here in fiddle

http://jsfiddle.net/7DWfa/1/

Here is my code

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
HtmlNode.ElementsFlags.Remove("option");
htmlDoc.LoadHtml(s);
if (htmlDoc.DocumentNode != null){
HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
if (bodyNode != null)
{//what to do here to get title and href?
var inputs = from input in htmlDoc.DocumentNode.Descendants("div")
                     where input.Attributes["class"].Value == "results-data-price-btn"
                     select input;

}
}

Please guid me how to get div values via classes

Accepted Answer

Note: the below is untested, I've just quickly looked at the HTML of the page and tried to understand how it 'fits' together.

Each car 'result' has a div with the class search-results-box. So....

var rootNode = htmlDoc.DocumentNode;
var allCarResults = rootNode.SelectNodes("//div[normalize-space(@class)='search-results-box']");
foreach (var carResult in allCarResults)
{

}

You have each 'car result' (as in, each item is now the entire section that represents one of the cars...so dig deeper....

Within each of these, the data of the car is within another div, with the class search-results-data...so....

var dataNode = carResult.SelectSingleNode(".//div[@class='search-results-data']");

Within this, you will now dig even deeper. The title of the car is within another element, specifically within a child h2...

var carNameNode = dataNode.SelectSingleNode(".//h2/a");
string carName = carNameNode.InnerText.Trim();

The price of the car is most difficult thanks to the horrible markup in the HTML.

It sits within a font element which is inside another div...

var carPriceNode = dataNode.SelectSingleNode(".//div[@class='results-data-price-btn']/font");
string carPrice = carPriceNode.InnerText.Trim(); // this will give you AED 24,500. Perform some logic to split that up so you just have the number...a

The problem is that the price is stuck together as "AED 24,500" in one element. Therefore you can easily get the element, but if you want just the number, that's logic you'll need to figure out for yourself to do.

The image itself, is fine. That's a level up in the markup, back up as a child under the carResult, so up we go.....:

var carImageNode = carResult.SelectSingleNode(".//div[@class='search-results-img']/descendant::img");
string carImageSource = carImageNode.GetAttributeValue("src", string.Empty);

Re-edit

All of the 'More Details about this used car' information is stuffed into one place, so the below will work for your example but may not work for all of them:

var descriptionNode = rootNode.SelectSingleNode("//div[@id='description']");

var entireDescription = descriptionNode.InnerText.Trim();

var splitUpDescriptionParts =
    entireDescription.Split(
        new[]
            {
                "More Details about this Used Car:", "Body Condition:", "Mechanical Condition:", "Doors:", "Cylinders:", "Body Style:",
                "Drive Type:", "Warrenty:", "Description:"
            },
        StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim()).Where(s => !string.IsNullOrWhiteSpace(s));

string bodyCondition = splitUp.First();
string mechancialCondition = splitUp.ElementAt(1);
string amountOfDoors = splitUp.ElementAt(2);
string amountOfCylinders = splitUp.ElementAt(3);
string bodyStyle = splitUp.ElementAt(4);
string driveType = splitUp.ElementAt(5);
string warranty = splitUp.ElementAt(6);
string description = splitUp.Last();



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why