htmlagilitypack parsing links and inner text

c# html-agility-pack

Question

Since I'm new to the HTML Agility Pack, I'm trying to find out how to retrieve the links from a setup like this.

<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px  20px; font-weight: bold; font-size: 13px;">test</div>
    <div>
    <div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
    <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
     <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>

I have not yet written any C# code, but I was wondering if someone could provide advice on what tags to point to to get the links and inner content in the absence of an HTML ID. Thanks

1
1
3/18/2013 2:38:50 PM

Popular Answer

You can travel through the html components and attributes to acquire anything you want if you are acquainted with XPATH. You might use the following code to obtain each href in the example above:

 const string xpath = "/div//span/a";

 //WebPage below is a string that contains the text of your example
 HtmlNode html = HtmlNode.CreateNode(WebPage);
 //The following gives you a node collection of your two <a> elements
 HtmlNodeCollection items = html.SelectNodes(xpath);
 foreach (HtmlNode a in items)
 {    
      if (a.Attributes.Contains("href"))
      //Get your value here
      {
           yourValue = a.Attributes["href"].Value
      }
 }

Note: I haven't tried or ran this code.

1
3/19/2013 8:46:12 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow