Problem choosing subnode in HTML Agility Pack

asp.net-mvc c# html-agility-pack

Question

Since Asics does not provide the functionality I need to export my Asics running schedule to iCal, I created a little scraper for my own usage. What I want to do is create an iCal feed based on all the scheduled runs from my plan. I'm using HTML Agility Pack and C#.

Iterate through all of my planned runs is what I want to do (they are div nodes). Then, using my run nodes, I want to choose a few different nodes. This is how my code looks:

foreach (var run in doc.DocumentNode.SelectSingleNode("//div[@id='scheduleTable']").SelectNodes("//div[@class='pTdBox']"))
{
    number++;
    string date = run.SelectSingleNode("//div[@class='date']").InnerText;
    string type = run.SelectSingleNode("//span[@class='menu']").InnerHtml;
    string distance = run.SelectSingleNode("//span[@class='distance']").InnerHtml;
    string description = run.SelectSingleNode("//div[@class='description']").InnerHtml;
    ViewData["result"] += "Dato: " + date + "<br />";
    ViewData["result"] += "Tyep: " + type + "<br />";
    ViewData["result"] += "Distance: " + distance + "<br />";
    ViewData["result"] += "Description: " + description + "<br />";
    ViewData["result"] += run.InnerHtml.Replace("<", "&lt;").Replace(">", "&gt;") + "<br />" + "<br />" + "<br />";
}

My issue is thisrun.SelectSingleNode("//div[@class='date']").InnerText does not choose the run node that corresponds to the provided XPath. It chooses the first node in the whole page that fits the XPath.

How can I choose the specific node inside the current node that matches the provided XPath?

I'm grateful.

Update

I attempted to change my XPath string to the following:

string date = run.SelectSingleNode(".div[@class='date']").InnerText;

This need to choose the<div class="date"></div> the current node's element, correct? I tried it, but I get the following error:

Expression must evaluate to a node-set. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.Xml.XPath.XPathException: Expression must evaluate to a node-set.

Any recommendations?

1
28
3/1/2013 12:14:34 AM

Accepted Answer

Here are a few tips for using the HtmlAgilityPack and XPath expressions.

If run is anHtmlNode , then:

  1. run.SelectNodes("//div[@class='date']")
    Will function in the exact same manner asdoc.DocumentNode.SelectNodes("//div[@class='date']")

  2. run.SelectNodes("./div[@class='date']")
    Will provide all the<div> nodes that are subordinate torun node. Only the very next depth level will be searched; not any deeper.

  3. run.SelectNodes(".//div[@class='date']")
    Will give back all the<div> not just adjacent to the nodes that have that class property, but alsorun node, but also do a thorough search (every possible descendant of it)

Depending on which option best meets your requirements, you will have to pick between options 2. or 3.

59
8/4/2012 12:59:58 PM

Popular Answer

XPATH uses// indicates all nodes below the present one, including their children and grandchildren. Therefore, you must create a more stringent XPATH expression. We can assist you in doing more research if you provide us the actual HTML and the details of your search.

Regarding the mistake you made:

.div[@class='date'] is incorrect because. is adhered todiv You might employdiv[@class='date'] , or./div[@class='date'] which I think are similar. This is due to. is a axe, XPATH, which is another way of sayingself , which denotes "the current node."



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow