With html agility pack, how can I retrieve the title and href value of a link separately?

.net c# html-agility-pack


I'm attempting to download a website with a table on it.

<table id="content-table">
      <th id="name">Name</th>
      <th id="link">link</th>

    <tr class="tt_row">

      <td class="ttr_name">
       <a title="name_of_the_movie" href="#"><b>name_of_the_movie</b></a>
       <span class="pre">message</span>

      <td class="td_dl">
        <a href="download_link"><img alt="Download" src="#"></a>


    <tr class="tt_row"> .... </tr>
    <tr class="tt_row"> .... </tr>

I want to take the movie's name out of the td class="ttr name" and get the download link out of the td class="td dl."

I used this code to cycle over the table rows.

HtmlAgilityPack.HtmlDocument hDocument = new HtmlAgilityPack.HtmlDocument();
HtmlNode table = hDocument.DocumentNode.SelectSingleNode("//table");

foreach (var row in table.SelectNodes("//tr"))
  HtmlNode nameNode = row.SelectSingleNode("td[0]");
  HtmlNode linkNode = row.SelectSingleNode("td[1]");

I'm now unsure of how to inspect the nameNode and linkNode and retrieve the info they contain.

We would appreciate any assistance.


2/20/2012 8:36:10 AM

Accepted Answer

Although I'm unable to test it at the moment, it should read something along the lines of:

    string name= namenode.Element("a").Element("b").InnerText;
    string url= linknode.Element("a").GetAttributeValue("href","unknown");
2/20/2012 9:03:53 AM

Popular Answer

    public const string UrlExtractor = @"(?: href\s*=)(?:[\s""']*)(?!#|mailto|location.|javascript|.*css|.*this\.)(?<url>.*?)(?:[\s>""'])";

    public static Match GetMatchRegEx(string text)
        return new Regex(UrlExtractor, RegexOptions.IgnoreCase).Match(text);

How to obtain all Href Url is shown here. That regex is what I'm using in one of my projects, however you may change it to suit your requirements and rewrite it to match the title as well. I believe matching them in quantity is more practical.

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow