html agility pack get td's innertext of a tr's next tr

c# html-agility-pack xpath

Question

I'm having some trouble with html agility pack I have randomly generated trs with the same class="related-news" repeated in one or more than one table but they are not necessarily following each other and most of the time there are trs with no class between them. What I'm trying to do is to get the innerText of the td's of those trs with no class and put them together in an array along with the tr class="related-news" that is right above them

This is the html

<tr class="related-news">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="related-news">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>
<tr class="related-news">
   <td>some text</td>
   <td>some text</td>
   <td>some text</td>
</tr>

There is no way to know how many trs with no class will be gerenated between trs with class "related-news" I need to get the inner text of all the tds and I have no promblem with that the problem is to separate them every time I reach the tr with class "related-news"

something like this:

if this tr has class "related-news" get this tr's ts's innertext and the innertext of next trs while they don't have class "related-news" if a tr with "related-news" class is reached create a new arrey and continue

is this even possible with html agility pack?

I get the innertext of every td with this code:

HtmlNodeCollection nodes2 = doc.DocumentNode.SelectNodes("//tr[@class='related-news']/td");
        foreach (HtmlNode node in nodes2)
        {
            string Text = node.InnerText;
        }

I don't know how to continue or add a condition

Popular Answer

This is just a manifestation of a pretty common operation: going through a sequential list and combining things.

The basic idea would be to get all of the <tr> nodes, not just the "related-news" nodes. Then, you go through the list and group them. The pseudo-code below shows how it's done.

List<string> TextLines = new List<string>();
StringBuilder sb = new StringBuilder();
foreach (var node in nodes)
{
    if node class == "related-news"
    {
        // we've found a new "related-news" node.
        // add the previous stuff to the list
        if (sb.Length > 0)
            TextLines.Add(sb.ToString());
        sb = new StringBuilder(node.InnerText);
    }
    else
    {
        sb.Append(node.InnerText);
    }
}
// and don't forget the last one
if (sb.Length > 0)
    TextLines.Add(sb.ToString());

Note that the code just cares about <tr> tags that have class "related-news" and those that don't. If there are other classes you want to group differently, you'd have to modify the code.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why