HTML XPath Searching by class name c# html html-agility-pack xpath


I Have a problem with xpath in c#
I want to find all elements with this structure
I have 10 links which all of them have this structure:

<div class="PartialSearchResults-item" data-zen="true">
<div class="PartialSearchResults-item-title">
<a class="PartialSearchResults-item-title-link result-link"target="_blank" href=''> Google</a>
<p class="PartialSearchResults-item-url"></p>
<p class="PartialSearchResults-item-abstract">Search the world.</p>

for example with this sample i want to get "Google" and "" and "Search the world."

var titles = hd.DocumentNode.SelectNodes("//div[contains(@class, 'PartialSearchResults-item')]");
string link;
foreach (HtmlNode node in titles){
string description = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-abstract')]").InnerText;

link = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-url')]").InnerText;

 string title = node.SelectSingleNode(".//a[contains(@class,'PartialSearchResults-item-title-link result-link')]").InnerText;}

But I get error null reference

5/8/2017 4:34:48 AM

Accepted Answer

The problem is in the query where you are getting the titles. You are looking for div which's class attribute contains PartialSearchResults-item, which is your item's root node. But there is also other nodes which are satisfying to your query, for example the div with class PartialSearchResults-item-title also satisfying to your query. Then after selecting this 2 divs you are iterating over them and trying to get sum child nodes, for the first iteration your code will work fine, because you have right node, but in the second iteration you have the node with class PartialSearchResults-item-title which only have one a, so you will get NullReferenceException in the second iteration when you are querying for the description, because you are trying to get value of the InnerText property of null object

string description = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-abstract')]").InnerText;

I would suggest to not use contains. In your case your root node has only one class PartialSearchResults-item, so you can query it like this

var titles = hd.DocumentNode.SelectNodes("//div[@class='PartialSearchResults-item']");
5/7/2017 5:27:26 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow