In this post I detail a method for downloading the FLV file associated with a youtube video. This is a basic guide and doesn't go into much detail about different quality or how to determine file size or stuff like that, how to do it entirely on the client side, and stuff like that. but I have implemented it and it works.
Here are the steps in the process.
1. Get the html of youtube.com/watch?v=asdlhkhasd
2. Run all the javascript on it to produce the final html that the browser uses
3. find the element named "movie_player" and get the content of its flashVars attribute
4. url un-encode that twice
5. find the stuff between "fmt_stream_map=X|" and "|" where X is (as far as I have seen) 1 to 2 numeric digits.
6. Thats the link to your .flv file
Here's how I implemented it. This script assumes that you have a directory named files next to your php script which php can write to.
Show/Hide
<?php
$httpHost = isset($_SERVER['HTTP_HOST']) ? $_SERVER['HTTP_HOST'] : (isset($_SERVER['SERVER_NAME']) ? $_SERVER['SERVER_NAME'] : 'localhost');
$scriptUrl = 'http' . ((isset($_ENV['HTTPS']) && $_ENV['HTTPS'] == 'on') || $_SERVER['SERVER_PORT'] == 443 ? 's' : '') . '://' . $httpHost . ($_SERVER['SERVER_PORT'] != 80 && $_SERVER['SERVER_PORT'] != 443 ? ':' . $_SERVER['SERVER_PORT'] : '') . $_SERVER['PHP_SELF'];
$scriptBase = substr($scriptUrl, 0, strrpos($scriptUrl, '/')+1);
$inputWidth = "500px";
if(isset($_GET['ajaxFileProg'])) {
if(file_exists($_GET['ajaxFileProg'])) {
echo filesize($_GET['ajaxFileProg']);
} else {
echo "0";
}
exit(0);
}
if(isset($_GET['refreshDownloadAndCache'])) {
$srcFlv = urlencode($_GET['refreshDownloadAndCache']);
$getFName = trim(urlencode(trim($_GET['filename'])));
$getFName1 = $_GET['filename'];
$getFName1 = str_replace("\n", "", $getFName1);
$getFName = str_replace("\n", "", $getFName);
echo <<<END
<html><head><script>
parent.document.getElementById("scrapingFrame").src="?downloadAndCache=$srcFlv&filename=$getFName";
</script></head><body></body></html>
END;
exit(0);
}
if(isset($_GET['downloadAndCache'])) {
$srcFlv = $_GET['downloadAndCache'];
$getFName = str_replace("\n", "", trim($_GET['filename']));
$dest = "files/" . $getFName;
$file = copy($srcFlv, $dest);
$size = filesize($dest);
echo <<<END
<html><head><script>
parent.document.getElementById("scrapingFrame").src="";
parent.endScrapeDownloadMonitor("$getFName");
</script></head><body>bing!</body></html>
END;
exit(0);
}
if(isset($_GET['scrapeURL'])) {
$url = $_GET['scrapeURL'];
$res = file_get_contents($url);
if(preg_match("/youtube.com/i", $url, $match)) {
$res = preg_replace("/<img\\b[^>]*>(.*?)<\\/img>/i", "", $res);
$youtubeScrapeScript = <<<END
<script>
window.onload = Start;
var deltaTime = 0.5;
var done = false;
var moviePlayer;
function Start () {
UpdateLoop();
}
function UpdateLoop() {
Update();
// using setTimeout with a string will not work in greasemonkey environment.
setTimeout("UpdateLoop", deltaTime * 1000);
}
function Update () {
if(!done) {
if(!moviePlayer) moviePlayer = document.getElementById("movie_player");
if(moviePlayer) {
var vars = unescape(unescape(moviePlayer.getAttribute("flashVars")));
var regex = /fmt_stream_map=[0-9|]{1,3}(http[^\|]*)/i;
var mymatch = regex.exec(vars);
var regex1 = /&length_seconds=([0-9]{0,5})&/i;
var mymatch1 = regex1.exec(vars);
var filename1 = escape(trim(document.getElementById("eow-title").innerHTML) + ".flv");
parent.beginScrapeDownloadMonitor(escape("files/") + filename1, parseInt(mymatch1[1])*50000);
parent.document.getElementById("scrapingFrame").src = "?refreshDownloadAndCache=" + escape(mymatch[1])
+ "&filename=" + filename1;
done = true;
}
}
}
function trim(stringToTrim) {
return stringToTrim.replace(/^\s+|\s+$/g,"");
}
</script>
<script>
END;
$res = preg_replace("/<script>/i", $youtubeScrapeScript, $res, 1);
echo $res;
exit(0);
}
}
echo <<<END
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>$title</title>
<style>
a:link {color:#adc9b1}
a:visited {color:#adc9b1}
a:active {color:#adc9b1}
a:hover {color:#adc9b1}
body{margin:0px; background: #000000; color:#ffffff; font: 11px bitstream vera sans, helvetica, verdana}
.inputClass { border: 1px solid #778a7c;background: #112126; color: #adc9b1; }
</style>
<script type="text/javascript">
window.onload = Start;
var deltaTime1 = 0.5;
var nocache = 0;
var xmlHttp;
var result = "";
var filename = "";
var filesize = 0;
var scrapeDownloadMonitor = false;
function Start () {
UpdateLoop();
}
function startScraping () {
var scrapeURL = document.getElementById("scrapeURL").value;
var filename1 = scrapeURL.substring(scrapeURL.lastIndexOf("/")+1, scrapeURL.length);
var arg = scrapeURL.indexOf("youtube.com") != -1 ? "scrapeURL" : "filename="+filename1+"&refreshDownloadAndCache";
document.getElementById("scrapingFrame").src="?"+arg+"=" + document.getElementById("scrapeURL").value;
document.getElementById("line1").innerHTML = "<center><img style=\" display:inline; font-weight:bold;\" border=\"0\" src=\"loader.gif\"></img><div style=\"position:relative; display:inline; top:-4px;\"> scraping...</div></center>";
}
function beginScrapeDownloadMonitor (fname, fsize) {
filename = fname;
filesize = fsize;
scrapeDownloadMonitor = true;
}
function endScrapeDownloadMonitor (msg1) {
scrapeDownloadMonitor = false;
parent.document.getElementById("line1").innerHTML = "Download: <a href=\"$scriptBase/files/"+msg1+"\">" + msg1 + "</a>";
}
function UpdateLoop() {
if(scrapeDownloadMonitor) Update();
setTimeout(UpdateLoop, deltaTime1 * 1000);
}
function Update () {
nocache += 1;
//parent.document.getElementById("line1").innerHTML = "?ajaxFileProg=" + filename + "&nocache="+nocache;
AjaxSend("?ajaxFileProg=" + filename + "&nocache="+nocache, "DownloadRecieveData");
}
function DownloadRecieveData () {
if(result.length > 0) {
var pixel = ((result/filesize)*100)+2;
//if(pixel > 100) pixel = 100;"
parent.document.getElementById("line1").innerHTML = "<img src=\"bar.png\" height=\"15px\"; width=\"" + parseInt(pixel) + "%\"></img>";// "result:" + result + " filse: " + filesize + " pixel: " + pixel;//
}
}
function AjaxSend(getFlags, callback)
{
var globalContext = (function(){return this;})();
xmlHttp = null;
try
{
// Firefox, Opera 8.0+, Safari
xmlHttp=new XMLHttpRequest();
}
catch (e)
{
// Internet Explorer
try
{
xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");
}
catch (e)
{
try
{
xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");
}
catch (e)
{
return false;
}
}
}
xmlHttp.onreadystatechange=function()
{
if(xmlHttp.readyState==4)
{
result = xmlHttp.responseText;
globalContext[callback]();
}
}
xmlHttp.open("GET",getFlags,true);
xmlHttp.send(null);
}
</script>
</head>
<body>
<iframe id="scrapingFrame" style="float:right; display:none;" src=""></iframe>
<div class="inputClass" style="position:relative; top:10px; padding:3px; margin:3px; width: $inputWidth;" id="line1">
<b>URL: </b><input style=display:inline; padding:2px; width:435px; class=inputClass type=text id=scrapeURL length=100 value="" />
<input style=display:inline; padding:2px; font-size: 1.2em; type=button value=Scrape onclick=startScraping() /><br>
</div>
</body>
</html>
END;
?>



13 comments:
Hi, I just made a post somewhat related to yours. In mine I use pure php to download the flv video the same way the youtube flash client does. I thought it might be useful to someone:
http://1chris.com/post/39/using-php-to-download-youtube-flvs/
I too would like to implement it from the reference of ur blog.I hope scraping the videos will be definitely be easy if i follow this PHP code.
Coach Outlet Online is a good store!in
Coach Factory Outlet
Coach outlet online Leading American designer and maker of luxury lifestyle handbags and accessories.Large market in Europe and Canada,UK,USA etc.Welcome to Order!2011 New Style arrive,Free Shipping!
As a Coach Factory Outlet Online, you get paid several ways. The primary way is resale - here is where you sell a product thru your company's Coach Factory Outlet Online. The 2nd way is membership sale - you can sell a club membership and you'll get paid unearned income each time the customer renews their Coach Factory Outlet Online.
You will never buy fashion and cheap Coach Handbags. These benefits, Coach Factory Outlet are the best shopping place for most women. They are the first choice for women.wesseet
Your success is my inspiration.Indeed, it is wonderful !!
the article is interesting Coach Factory Outlet although there is a small mistake in the chart.
the article is very interesting..Good!!thank you!!
Coach Outlet Online Build a network monitoring long-term mechanism of action to rectify the vigorous national network finally come to an end.
The main way is resale - right here is exactly where you market a item thru your company's mentor Factory Outlet Online.
When you have nothing to do or unhappy? What should you do to spend the free time? does anyone have some good suggestions to get the fast and safecheap swtor credits, please tell me and thanks a lot who can help me, and how to have theGold for WOWwith less money, is there anyway to get the for us? We need to think about this.
Making sure your legs are covered is just the first step on Coach Outlet the exciting road to choosing the right morning dress trousers for you.
Post a Comment