How do I get direct URL of the .mp4 video using C#?

c# html-agility-pack video web-scraping windows-phone

Question

I need some sort of algorythm that would extract the link from mp4engine.

Here is the example of a page I want to scrap.

Desired output in this case would be: http://mp4engine.com:182/d/a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbte2flb4i4hrz6/.hack_Roots (Dub) Episode 001-360p.mp4

I tried to use HtmlAgilityPack to get the player code, but it's p,a,c,k,e,d, and I'm unable to execute it inside my C# Windows Phone 8.1 project. I thought about using Jurassic package to execute the JS, but it doesn't seem to work with WinPhone8.1

Here is the script I get using HAP:

<script type='text/javascript'>eval(function(p,a,c,k,e,d){while(c--    )if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p} ('15("14").13({f:"0://2.1:e/d/c/.b (a) 9 8- 7.6",12:"0://2.1/4/h.g",11:"0://2.1/i/10/z.y",x:"w",v:u,t:s,5:"0",r:"0://2.1/4/q /p",o:[{3:"n",m:"0://2.1/4/h.g"},{3:"l",k:{f:\'0://2.1:e/d/c/.b (a) 9 8- 7.6\',\'5\':\'0\'}},{3:"j"}],});',36,42,'http|com|mp4engine|type|player|provider|mp4|360p|001|Episode|Dub|hack_Roots|a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbte2flb4i4hrz6||182|file|swf|jw6||download|config|html5|src|flash|modes|six|skins|skin|420|height|722|width|1484|duration|jpg|hahgl235zwv2|00000|image|flashplayer|setup|flvplayer|jwplayer'.split('|')))

I have also tried to use built-in WebView Control:

WebView wv = new WebView();
//... navigation to string and all that
var res = await wv.InvokeScriptAsync("eval", null);

Unfortunately, the function returns empty string (res = "")

I have also searched for base64 string that I could decode, but the page doesn't seem to have one.

What can I do to get the video URL?

Popular Answer

HtmlAgilityPack only take the static HTML code, you need to execute the dynamic content (javascript) to take the data.

You have three ways:

1 - Implement a beautifier code for javascript in your c# (here you can see an example: http://jsbeautifier.org/). In this case and only for you case, you can extract the video url because is on it, but this is not common.

2 - Using the .net web browser to connect to the page and execute the javascript code to scrape the data, in this case you application must be a Windows Form application.

3 - Using a headless-browser to connect to the page and execute the javascript code to scrape the data. You could use the famous phatomjs. Example here: C# example of using PhantomJS webdriver ExecutePhantomJS to filter out images



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why