Get values of htm tags with HtmlAgilityPack

c# css html html-agility-pack

Question

I have a lots of Html block code with following style, I need values of

  1. Value of src attribute for img
  2. Date value
  3. Value of source attribute for second img
  4. Details

that I specified these number in a code.

Finally I want to put all these values in a XML file. So could you please help me regarding how I could get these values with HtmlAgilityPack?

Thanks in advance.

<div class="promotion"> 
 <div class="logo">
       <img src='http://www.example.com/D.jpg' **(1)**>         
 </div>
 <div class="details">
    <p class="date"> 2015/12/12 **(2)** </p>
    <p> 
       <img src="http://www.example.com/DDD.jpg" **(3)** alt="" />
       <h3> Some Details **(4)** </h3>
    </p> 
 </div>
</div>

Accepted Answer

If you HTML is like you put in your question you can use XPath to retrieve your results in the following way ordered:

With a previous code like this, for example to test with your HTML :

var html = @"<div class='promotion'> 
                     <div class='logo'>
                           <img src='http://www.example.com/D.jpg' **(1)**>         
                     </div>
                     <div class='details'>
                        <p class='date'> 2015/12/12 **(2)** </p>
                        <p> 
                           <img src='http://www.example.com/DDD.jpg' **(3)** alt='' />
                           <h3> Some Details **(4)** </h3>
                        </p> 
                     </div>
                    </div>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
  1. For the first image for example something like this :

    var value = doc.DocumentNode.SelectSingleNode("//div[@class='logo']/img").Attributes["src"].Value;
    
  2. For the second something like this :

    var value = doc.DocumentNode.SelectSingleNode("//p[@class='date']").InnerText;
    
  3. For the third :

    var value = doc.DocumentNode.SelectSingleNode("//div[@class='details']/p[2]/img").Attributes["src"].Value;
    
  4. And for the four :

    var value = doc.DocumentNode.SelectSingleNode("//div[@class='details']/p[2]/h3").InnerText;
    

I hope this help you.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why