HtmlAgilityPack, using XPath contains method and predicates

contains html-agility-pack xpath


Using XPath, the HtmlAgilityPack includes method

I'm using HTML Agility Pack and need to determine whether a class attribute has a certain word. Currently, I have the following page:

<div class="yom-mod yom-art-content "><div class="bd">
<p class="first"> ....................

What I'm doing is

HtmlDocument doc2 = ...;
List<string> paragraphs = doc2.DocumentNode.SelectNodes("//div[@class = 'yom-mod yom-art-content ']//p").Select(paragraphNode => paragraphNode.InnerHtml).ToList();

But what I really need is something along these lines:

List<string> paragraphs = doc2.DocumentNode.SelectNodes("//div[contains(@class, 'yom-art-content']//p").Select(paragraphNode => paragraphNode.InnerHtml).ToList();

But it doesn't; could you kindly assist me?

2/4/2013 7:12:00 PM

Accepted Answer

The contains() function's concluding parentheses can be the cause of the problem.

//div[contains(@class, 'yom-art-content']//p
//div[contains(@class, 'yom-art-content')]//p

List<string> paragraphs = 
        doc2.DocumentNode.SelectNodes("//div[contains(@class, 'yom-art-content')]//p")
            .Select(paragraphNode => paragraphNode.InnerHtml).ToList();

As a general request, when you mention something like "that didn't work," please clarify what you mean. I believe you're seeing an error message that might be useful in identifying the problem.

12/24/2013 2:59:03 PM

Popular Answer

Instead of doing this with the HAP, consider using CsQuery, which offers jQuery style selectors.

It seems to be very appropriate for what you are attempting.

CsQuery is a jQuery port for .NET 4. It implements all CSS2 & CSS3 selectors, all the DOM manipulation methods of jQuery, and some of the utility methods. The majority of the jQuery test suite (as of 1.6.2) has been ported to C#.

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow