Getting the value of the 'href' inside of a div in HTMLAgilityPack in C#

c# href html-agility-pack xpath

Question

I am trying to grab the value of a "href". The code is something like this:

          <div class="s_newsbox" style="font-size:12px; vertical-align:middle; overflow: hidden; float:left; margin:10px; margin-bottom:15px; height: 270px; width:280px; border-radius:6px; position:relative; text-align:center; padding:0px">
            <div style="background-color:#292929; background-color:rgba(0,0,0,0.8); padding:5px; padding-left:2px; padding-right:10px; width:100%; position:absolute; top:0; left:0;"><b>Samsung nx30 + zoom kit 18/55</b>
            </div>
            <a href="vendo.php?t=1395911">
              <img style="width:100%; height:100%" src="http://img1.juzaphoto.com/shared_files/uploads_mercatino/sell_1395911_small.jpg" alt="">
              <br></a>
            <div style="line-height:150%; background-color:#292929; background-color:rgba(0,0,0,0.8); padding:5px; position:absolute; bottom:0; left:0; margin-left:auto; width:100%; text-align:left">Venditore: 
              <a href="me.php?l=it&amp;p=45923"><b>Pierobob</b></a>  
              <br> Prezzo: <b>350 &euro;</b>  
              <br> Zona: <b>Bologna</b>  
              <br> 
              <a href="vendo.php?t=1395911">Leggi annuncio</a> (8 visite)
              <br>
            </div>
          </div>

What I am trying to do is this:

           var list = page.DocumentNode.SelectNodes("//div[@class='s_newsbox']");
           foreach (var obj in list)
            {
              var url = obj.SelectSingleNode(".//a").Attributes["href"].Value;

I want to grab the value 'vendo.php?t=1395911' but instead I get the href value of another line, which doesn't have a parent div with the class 's_newsbox'

What I am doing wrong?

Thanks you!

Accepted Answer

You can filter down the objects in question with more accurate xpath as long as you don't need any of the other nodes inside the s_newsbox div.

       var list = page.DocumentNode.SelectNodes("//div[@class='s_newsbox']/a[string-length(@href)>0]");
       foreach (var obj in list)
        {
          var url = obj.SelectSingleNode(".").Attributes["href"].Value;


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow