HTML Agility Pack for YouTube in C#

c# html html-agility-pack html-parsing

Question

I'm attempting to get every video identifier from the YouTube search results page.

This code appears for each outcome:

<a href="/watch?v=aYIC-ebAD3o" class="ux-thumb-wrap result-item-thumb">
  <span class="video-thumb ux-thumb-128 ">
    <span class="clip">
      <img onload="tn_load(5)" alt="Thumbnail" src="//i2.ytimg.com/vi/aYIC-ebAD3o/default.jpg" >
    </span>
  </span>
  <span class="video-time">4:16</span>
  <span dir="ltr" class="yt-uix-button-group addto-container short video-actions" data-video-ids="aYIC-ebAD3o" data-feature="thumbnail">
    <button type="button" class="start master-sprite  yt-uix-button yt-uix-button-short yt-uix-tooltip" onclick=";return false;" title="" data-button-action="yt.www.addtomenu.add" role="button" aria-pressed="false">
      <img class="yt-uix-button-icon yt-uix-button-icon-addto" src="//s.ytimg.com/yt/img/pixel-vfl3z5WfW.gif" alt="">
        <span class="yt-uix-button-content">
          <span class="addto-label">Add to</span>
        </span>
    </button>
    <button type="button" class="end  yt-uix-button yt-uix-button-short yt-uix-tooltip yt-uix-button-empty" onclick=";return false;" title="" data-button-menu-id="shared-addto-menu" data-button-action="yt.www.addtomenu.load" role="button" aria-pressed="false">
      <img class="yt-uix-button-arrow" src="//s.ytimg.com/yt/img/pixel-vfl3z5WfW.gif" alt="">
    </button>
  </span>
  <span class="video-in-quicklist">Added to queue    </span>
</a>
<div class="result-item-main-content"> 

Additionally, I'm attempting to extract the "data-video-ids" class data. What's the best method to use the Agility Pack for HTML for this?

I've tried this:

foreach(HtmlNode node in doc.DocumentNode.
    SelectNodes("//span[@class='data-video-ids']"))
{
    string text = node.InnerText;
    lblTest2.Text += text + Environment.NewLine;
}

Any thoughts?

1
3
3/15/2011 8:20:56 PM

Accepted Answer

Please try the following expression in SelectNodes because "data-video-ids" is a property rather than a class that you are attempting to filter out:

"//span[@data-video-ids]"

Since HTMLAgilityPack does not provide attribute selection, you must first get an element before selecting the actual attribute, you might attempt the following method to extract the attribute value:

foreach(HtmlNode node in doc.DocumentNode.
    SelectNodes("//span[@data-video-ids]"))
{
    var videoIds = node.Attributes["data-video-ids"];
    if (videoIds == null) continue;

    string text = videoIds.Value;
    lblTest2.Text += text + Environment.NewLine;
}
3
3/15/2011 8:29:08 PM

Popular Answer

I believe using one of YouTube's APIs will benefit you in the long term.

When there is no API available, I would only use web requests with HTMLAgilityPack as a last option. The major reason for this is that your code will be broken if YouTube ever changes their page. Since open APIs are often designed to be backwards compatible, your program should typically continue to function forever.

Here is an example of code from the Youtube API:

YouTubeQuery query = new YouTubeQuery(YouTubeQuery.DefaultVideoUri);

//order results by the number of views (most viewed first)
query.OrderBy = "viewCount";

// search for puppies and include restricted content in the search results
// query.SafeSearch could also be set to YouTubeQuery.SafeSearchValues.Moderate
query.Query = "puppy";
query.SafeSearch = YouTubeQuery.SafeSearchValues.None;

Feed<Video> videoFeed = request.Get<Video>(query);

printVideoFeed(videoFeed);

Looks easy, doesn't it?



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow