How to identify an element (a) with a certain value for an attribute (href) and neighboring table columns using XPath/HtmlAgilityPack?

c# html html-agility-pack visual-studio xpath

Question

I'm in a really desperate situation since I'm unable to do what I asked for in the inquiry. I've previously read a ton of instances that are comparable, but I haven't come across one that fits the bill. Let's suppose I have the code shown below:

<table><tr>
<td><a href="url-a">text A</a></td><td><a>id A</a></td><td><a>img A</a></td>
<td><a href="url-b">text B</a></td><td><a>id B</a></td><td><a>img B</a></td>
<td><a href="url-c">text C</a></td><td><a>id C</a></td><td><a>img C</a></td>
</tr></table>

Now, url-a is made up of what I already have. I'm mostly interested in learning how to get ID A and IMG A. I'm attempting to use XPath to "find" the line, but I'm having trouble making it work. Additionally, it's conceivable that the data is completely absent. This is my most recent attempt (I've worked on this for more than three hours, trying several approaches):

if (htmlDoc.DocumentNode.SelectSingleNode(@"/a[contains(@href, 'part-url-a')]") != null)
    string ida = htmlDoc.DocumentNode.SelectSingleNode(@"/a[contains(@href, 'part-url-a')]/following-sibling::a").InnerText;

I'd be grateful if someone could assist me out since it seems to be completely incorrect. Additionally, I'd be grateful if someone could direct me to a website that provides in-depth explanations on XPath and the notations/Syntax using examples similar to this one. Also welcome are books.

PS: I'm aware that I could accomplish my goal without using XPath at all by using Regex or just a basic StreamReader in C# and determining whether each line contains the information I need, but a) that approach is too brittle for my needs because the code may contain abrupt line breaks, and b) I really want to stick to using XPath exclusively for everything I do in this project.

Thanks for your assistance in advance!

1
6
9/3/2011 8:36:22 PM

Accepted Answer

Use the XPath expressions listed below.:

   /*/tr/td[a[@href='url-a']]
                /following-sibling::td[1]
                     /a/text()

as compared to the given (incorrect but flawed) XML document:

<table><tr>
<td><a href="url-a">text A</a></td><td><a>id A</a></td><td><a>img A</a></td>
<td><a href="url-b">text B</a></td><td><a>id B</a></td><td><a>img B</a></td>
<td><a href="url-c">text C</a></td><td><a>id C</a></td><td><a>img C</a></td>
</tr></table>

The desired text node is chosen.:

id A

This XPath expression is similar.:

   /*/tr/td[a[@href='url-a']]
                /following-sibling::td[2]
                     /a/text()

when compared to the identical XML document (above), chooses the second desired text node.:

img A

using XSLT for verification:

(Above, When the XML document is subjected to this transformation)

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/tr/td[a[@href='url-a']]
                /following-sibling::td[1]
                     /a/text()"/>

  <xsl:text>&#10;</xsl:text>
  <xsl:copy-of select=
   "/*/tr/td[a[@href='url-a']]
                /following-sibling::td[2]
                     /a/text()"/>
 </xsl:template>
</xsl:stylesheet>

The desired outcomes are obtained:

id A
img A
8
9/3/2011 7:49:02 PM

Popular Answer

You have a very flawed HTML with mismatched concluding tags.td tags. Please fix them. This markup just paints a bad image.

As a result, presumably HTML Agility Pack can handle whatever garbage you throw at it; in that case, here's how to go about parsing the garbage you have and finding theid and img values supplied thehref :

class Program
{
    static void Main()
    {
        var doc = new HtmlDocument();
        doc.Load("test.html");
        var anchor = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'url-a')]");
        if (anchor != null)
        {
            var id = anchor.ParentNode.SelectSingleNode("following-sibling::td/a");
            if (id != null)
            {
                Console.WriteLine(id.InnerHtml);
                var img = id.ParentNode.SelectSingleNode("following-sibling::td/a");
                if (img != null)
                {
                    Console.WriteLine(img.InnerHtml);
                }
            }
        }
    }
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow