XPath statement to find nearest preceding sibling

c# html-agility-pack xpath

Question

I'm using the HTMLAgilityPack in a C# WPF application to loop through some anchor tags in a local HTML page and extract the href attribute. This works great, but I then need to find the title the anchor sits under within the HTML document (which is also an anchor tag). This should be easy enough to do with XPath, but I just can't seem to get a statement that works for all scenarios.

Here's a sample of my HTML (which I have no control over):

<html>
    <body>
        <table>
            <tr>
                <td><div><a href="#maintitle" class="title">maintitle</a></div></td>
            </tr>
            <tr>
                <td><div><a href="#subtitle1" class="subtitle">subtitle1</a></div></td>
            </tr>
            <tr>
                <td><div><a href="link1.pdf">link1</a></div></td>
            </tr>
            <tr>
                <td><div><a href="link2.pdf">link2</a></div></td>
            </tr>
            <tr>
                <td><div><a href="link3.pdf">link3</a></div></td>
            </tr>
            <tr>
                <td><div><a href="#subtitle2" class="subtitle">subtitle2</a></div></td>
            </tr>
            <tr>
                <td><div><a href="link4.pdf">link4</a></div></td>
            </tr>
            <tr>
                <td><div><a href="link5.pdf">link5</a></div></td>
            </tr>
        </table>
    </body>
</html>

After finding link1, I then want to find subtitle1. Likewise for link2 and link3. But for link4 and link5, I want to find subtitle2. I'm using this XPath statement (the first section is there just to simulate the selection of an anchor tag, which I've been using with an online XPath evaluator https://www.freeformatter.com/xpath-tester.html):

//a[@href='link4.pdf']/ancestor::tr/preceding-sibling::tr//a[@class='subtitle']

This works for link1 to link3, but for link4 and link5 it returns both subtitle1 and subtitle2. Adding [1] to preceding-sibling::t fixes it for link4, but breaks it for link2, link3 and link5:

//a[@href='link4.pdf']/ancestor::tr/preceding-sibling::tr[1]//a[@class='subtitle']

I've also tried adding last() to preceding-sibling::t, but this results in nothing being found for any of the links:

//a[@href='link4.pdf']/ancestor::tr/preceding-sibling::tr[last()]//a[@class='subtitle']

I'm sure there's a simple solution, but I'm by no means competent with XPath so I'm struggling. How do I get my original XPath statement to return the closest sibling?

Accepted Answer

locator to get subtitle by link text ('link4')

(//a[text()='link5']/preceding::tr[.//a[@class='subtitle']])[last()]

logic:

//a[text()='link4'] - get element by linked text

//a[text()='link4']/preceding::tr - search for all tr parents

[.//a[@class='subtitle']] - get first parent containing tag a with class 'subtitle'

(someLocator)[last()] - get last element matching locator, in our case - get last parent containing tag a with class 'subtitle'

another option - initially search tr instead of a element

(//tr[.//a[text()='link5']]/preceding-sibling::tr//a[contains(@class,'subtitle')])[last()]

hopefully it will help anybody to get the logic ob building locators


Popular Answer

Try using the xpath :

//a[@href='<your_input>']/preceding-sibling::tr[.//a[@class='subtitle']][1]

where <your_input> could be link1.pdf to link5.pdf



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why