Getting attribute of element in XPath

html html-agility-pack xpath


I want to learn web-scraping. Therefore, I started practicing. I am trying to get data-ad-id from HTML using XPath.

HTML structure like this:

<body id="z1234">
    <div class="viewport">
        <div class="g-row">
            <div class="g-col-9">
                <div class="cBox cBox--content cBox--resultList">
                    <div class="cBox-body cBox-body--resultitem dealerAd rbt-reg rbt-no-top"><a class="link--muted no--text--decoration result-item" href="url" data-ad-id="248059713"></a>

XPath for <a class="link--muted no--text--decoration result item" > is //*[@id="z1234"]/div[3]/div[4]/div[2]/div[1]/div[11]/a. If I choose different car, only last div changes.

According to this I write C# code:

var url = "";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader sr = new StreamReader(response.GetResponseStream());
            string sourceCode = sr.ReadToEnd();

            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

            var rows = document.DocumentNode.SelectNodes("//*[@id='z1234']/div[3]/div[4]/div[2]/div[1]/div[11]");

            foreach (var row in rows)
                var id = row.SelectSingleNode("a[@data-ad-id]").InnerText;
                Console.WriteLine("id:" + id);

I cannot get anything from this Node. It is null. How can I get data-ad-id?

EDIT I change my C# code:

var rows = document.DocumentNode.SelectNodes("//a[@data-ad-id]")[0];
var id = rows.Attributes["data-ad-id"].Value;

Now I can get data-ad-id.

7/28/2017 3:59:58 PM

Accepted Answer

As per the code of the site, I could sense that you have no innertext for that "A" tag. It just contains DIV and IMG tags.

You will need to fetch data-ad-id using

7/28/2017 7:00:22 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow