I asked the question in a codeplex discussion but I hope to get a quicker answer here at stackoverflow.

So, I use HTML Agility Pack for HTML parsing in C#. I have the following html structure:

   <p class="paragraph">text</p>
   <p class="paragraph">text</p>
   <p class="specific">text</p>
   <p class="paragraph">text</p>
   <p class="paragraph">text</p>

And I need to get all p elements with class "paragraph" that exist after the p element with class "specific".

Is there a way to do that?


7/4/2010 9:49:02 AM

Accepted Answer

using .Class as in Mark's example (if that doesnt exist, substitute whatever is appropriate)

Use SkipWhile

e.g. in LINQPad you get 5,6,7 from:

int[] a = { 6, 5, 6 ,7 };

So depending on the type SelectNodes returns, either:

.SelectNodes( "/p" ).SkipWhile( p => p.Class != "specific" ).Skip(1)


.SelectNodes( "/p" ).Cast<XX>().SkipWhile( p => p.Class != "specific" ).Skip(1)

(or, ugly version)

.SelectNodes( "/p" ).SkipWhile( p => ((XX)p).Class != "specific" ).Skip(1)

(or in some cases - not if your expression is already filtering appropriately)

.SelectNodes( "/p" ).OfType<XX>().SkipWhile( p => p.Class != "specific" ).Skip(1)

EDIT: I'd probably create an extension method:

static class HapExtensions
    public IEnumerable<T> SkipUntilAfter( this IEnumerable<T> sequence, Predicate<T> predicate) {
        return sequence.SkipWhile( predicate).Skip(1);

Anyone care to search up prior art for this? Any good name suggestions?

12/14/2009 11:52:45 AM

Popular Answer

Try this

bool latterDayParagraphs = false;
List<DocumentNode> nodes = new List<DocumentNode>();
foreach(var pElement in doc.DocumentNode.SelectNodes("/p"))
   if(pElement.Class != "paragraph") 
      latterDayParagraphs = true;

