Recursive Scan directory

The place for "I can't figure out how to..." questions.
Post Reply
barryjarvis
User
Posts: 16
Joined: Tue May 07, 2013 6:21 pm
Are you a spambot?: no

Recursive Scan directory

Post by barryjarvis » Sat Mar 22, 2014 9:34 pm

Hi,

I've created a simple script (based on some of the demo's you supply with GetID3) that scans a dir and caches to my MySQL DB.

I'm looking now to modify that script and have it recursively scan a directory instead of a specific dir i name.

so currently i have:

Code: Select all

$arg = $_SERVER["argv"][1];

	$getID3 = new getID3_cached_mysql(<db_info_here>);
	$getID3->encoding = 'UTF-8';
	$getID3->option_save_attachments = false;

	// Scan the mixtape DIR
	$DirectoryToScan = $arg; 
	$dir = opendir($DirectoryToScan);

	// Loop through tracks to get ID3 info
	while (($file = readdir($dir)) !== false) {
		$FullFileName = realpath($DirectoryToScan.'/'.$file);
		if ((substr($FullFileName, 0, 1) != '.') && is_file($FullFileName)) {
			set_time_limit(30);
	
			// Analyze
			$ThisFileInfo = $getID3->analyze($FullFileName);
		
			getid3_lib::CopyTagsToComments($ThisFileInfo);
		}
	}
Unfortunately my PHP isn't too great, and so i'm wondering if you have a simple way to make this script in to a recursive scan?
I only need to go 1 dir deep within the base dir i specify, like so.

> Top dir
>> sub-folder <-- scan this
>>> sub-folder <-- don't scan this
>> sub-folder <-- scan this
>>> sub-folder <-- don't scan this

hopefully you can help :)

James Heinrich
getID3() v1 developer
Posts: 1432
Joined: Fri May 04, 2001 4:00 pm
Are you a spambot?: no
Location: Northern Ontario, Canada
Contact:

Re: Recursive Scan directory

Post by James Heinrich » Sun Mar 23, 2014 12:41 am

The simple basics of recursive scanning is to create a second function, which may call itself, to parse subdirectories.

Code: Select all

function recursiveScan($dirname) {
	if (opendir($dirname)) {
		while (($file = readdir($dir)) !== false) {
			$subdirname = $dirname.DIRECTORY_SEPARATOR.$file;
			if (substr($file, 0, 1) == '.') {
			  // skip any file or directory beginning with . such as . (current directory) and .. (parent directory) and typically hidden files like .htaccess
			} elseif (is_dir($subdirname)) {
			  // recursive call to this function, but on the subdirectory
			  recursiveScan($subdirname);
			} elseif (is_file($subdirname)) {
			  // do what you actually want to do to the files here, call getID3 or whatever
			}
		}
	}
}
Single-level recursion is actually harder than full recursion. I can leave that up to you as a reader exercise. Let me know if you can't figure it out.

barryjarvis
User
Posts: 16
Joined: Tue May 07, 2013 6:21 pm
Are you a spambot?: no

Re: Recursive Scan directory

Post by barryjarvis » Mon Mar 24, 2014 10:16 pm

James,

Thanks for the reply - you really do provide awesome support!

Like i said, my PHP is limited, so whilst i understand what you're script does, i have a few questions:

1)

Code: Select all

$subdirname = $dirname.DIRECTORY_SEPARATOR.$filename;
Where does $filename come from? I can't see it set anywhere?

2)

Code: Select all

if (substr($file, 0, 1) == '.') {
           // skip any file or directory beginning with . such as . (current directory) and .. (parent directory) and typically hidden files like .htaccess
How do i skip those files? Or is leaving this If statement blank doing that already?

3) Single level recursion has me beat... After a couple hours googling, i can't seem to come up with anything that makes sense to me.
Your method of recursion makes complete sense, but it seems everyone has a different way of doing things which is confusing the hell out of me!

James Heinrich
getID3() v1 developer
Posts: 1432
Joined: Fri May 04, 2001 4:00 pm
Are you a spambot?: no
Location: Northern Ontario, Canada
Contact:

Re: Recursive Scan directory

Post by James Heinrich » Mon Mar 24, 2014 10:40 pm

1) Sorry, my typo. When I wrote $filename I clearly meant $file, as in the variable assigned on the line above in the while readdir

2) That empty if clause does indeed skip the unwanted leading-dot files. The comment is just there to inform you why there's no actual code in that block. You could also put a continue statement as I have below to make it more clear.

3) If you really just need one-level-below current you could be tempted to skip recursion entirely and do the quick and dirty method of: if is_file then scan file, if is_dir then readdir and scan the files inside. However, a more "elegant" approach would be to use the recursion but have a limit parameter. If for some reason you then needed to recurse to 2 or 5 or 0 levels then it's trivial to change the behaviour. I would probably come up with something like this:

Code: Select all

function recursiveScan($dirname, $maxRecursion=false) {
	// $maxRecursion parameter can be FALSE for no limit on depth recursed, or an integer which signifies the number of levels to descend into.
	if (opendir($dirname)) {
		while (($file = readdir($dir)) !== false) {
			$subdirname = $dirname.DIRECTORY_SEPARATOR.$file;
			if (substr($file, 0, 1) == '.') {
				// skip any file or directory beginning with . such as . (current directory) and .. (parent directory) and typically hidden files like .htaccess
				continue;
			} elseif (is_dir($subdirname)) {
				// recursive call to this function, but on the subdirectory
				if ($maxRecursion === false) {
					// unlimited recursion
					recursiveScan($subdirname, false);
				} elseif ($maxRecursion > 0) {
					recursiveScan($subdirname, $maxRecursion - 1);
				}
			} elseif (is_file($subdirname)) {
				// do what you actually want to do to the files here, call getID3 or whatever
			}
		}
	}
}
So you could call the function the same as before, just add a "1" as the second parameter to limit the recursion to only descending into the first level of subdirectories it finds. If you put 2 it will go 2 levels down, and so on. If you leave off the second parameter, or explicitly pass a FALSE value then it will behave as before where it descends into all the directories it finds.
recursiveScan($dirname); // scans all subdirs
recursiveScan($dirname, false); // scans all subdirs
recursiveScan($dirname, 1); // scans 1 level of subdirs
recursiveScan($dirname, 2); // scans 2 level of subdirs
recursiveScan($dirname, 0); // scans no subdirs
Careful that while in PHP zero and false are loosely equivalent (0 == false == "" == null), they are not exactly the same thing, so passing 0 and false as the second parameter will give different behaviour.

barryjarvis
User
Posts: 16
Joined: Tue May 07, 2013 6:21 pm
Are you a spambot?: no

Re: Recursive Scan directory

Post by barryjarvis » Wed Mar 26, 2014 7:36 pm

Thanks James.

So in

Code: Select all

} elseif (is_file($subdirname)) {
            // do what you actually want to do to the files here, call getID3 or whatever
         }
If i was to simply change that to:

Code: Select all

} elseif (is_file($subdirname)) {
            $ThisFileInfo = $getID3->analyze($subdirname);
         }
Is it that simple?

Once again, thanks for your help so far.

James Heinrich
getID3() v1 developer
Posts: 1432
Joined: Fri May 04, 2001 4:00 pm
Are you a spambot?: no
Location: Northern Ontario, Canada
Contact:

Re: Recursive Scan directory

Post by James Heinrich » Wed Mar 26, 2014 8:09 pm

barryjarvis wrote:Is it that simple?
Yes, but you would of course want to also do something with $ThisFileInfo once you've got the data -- store it in a database or output to the screen, etc.

barryjarvis
User
Posts: 16
Joined: Tue May 07, 2013 6:21 pm
Are you a spambot?: no

Re: Recursive Scan directory

Post by barryjarvis » Thu Mar 27, 2014 8:00 pm

Thanks James,

I've put it together in a test script, but am hitting some issues.

I'm getting the below in my error log.

[27-Mar-2014 19:56:33] PHP Notice: Undefined variable: getID3 in /export/getID3/test.php on line 51
[27-Mar-2014 19:56:33] PHP Fatal error: Call to a member function analyze() on a non-object in /export/getID3/test.php on line 51

Any idea? I've played around with things, but can't seem to find why this is happening?

Here's my full script:

Code: Select all

#!/usr/bin/php
<?php 
error_reporting(E_ALL);
ini_set("log_errors", 1);
ini_set("error_log", "/export/getID3/errors.log");
error_log("Error --->>");

// Include the GetID3 lib's
require_once('getid3/getid3.php');
require_once('getid3/extension.cache.mysql.php');

// Initialise GetID3
$getID3 = new getID3_cached_mysql(<db info here>);
$getID3->encoding = 'UTF-8';
$getID3->option_save_attachments = false;

// Function to write to debug log
function logfile($filename, $msg)
   { 
   date_default_timezone_set('Europe/London');
   // open file
   $fd = fopen($filename, "a");
   // append date/time to message
   $str = "[" . date('l jS \of F Y h:i:s A') . "] " . $msg; 
   // write string
   fwrite($fd, $str . "\n");
   // close file
   fclose($fd);
   }

// Function to recursive scan (to predefined directory level) for cache all script
function recursiveScan($dirname, $maxRecursion=FALSE) {
   // $maxRecursion parameter can be FALSE for no limit on depth recursed, or an integer which signifies the number of levels to descend into.
   if ($dir = opendir($dirname)) {
      while (($file = readdir($dir)) !== false) {
         $subdirname = $dirname.DIRECTORY_SEPARATOR.$file;
         if (substr($file, 0, 1) == '.') {
            // skip any file or directory beginning with . such as . (current directory) and .. (parent directory) and typically hidden files like .htaccess
            continue;
         } elseif (is_dir($subdirname)) {
            // recursive call to this function, but on the subdirectory
            if ($maxRecursion === false) {
               // unlimited recursion
               recursiveScan($subdirname, false);
            } elseif ($maxRecursion > 0) {
               recursiveScan($subdirname, $maxRecursion - 1);
            }
         } elseif (is_file($subdirname)) {
         	
            // Analyze
			$ThisFileInfo = $getID3->analyze($subdirname);
			logfile("/export/getID3/debug.log", "Analyzing Track: ".$subdirname);
		
			getid3_lib::CopyTagsToComments($ThisFileInfo);
         }
      }
   }
}

	logfile("/export/getID3/debug.log", "*******START CACHING ALL MIXTAPES*******");
		$dirname = '111';
		// scans 1 level of subdirs
		recursiveScan($dirname, 1); 
	logfile("/export/getID3/debug.log", "*******END CACHING ALL MIXTAPES*******");
	
?>

James Heinrich
getID3() v1 developer
Posts: 1432
Joined: Fri May 04, 2001 4:00 pm
Are you a spambot?: no
Location: Northern Ontario, Canada
Contact:

Re: Recursive Scan directory

Post by James Heinrich » Thu Mar 27, 2014 11:17 pm

You're defining $getID3 outside the function, but using it inside the function. You need to either define $getID3 inside the function (but then it would be created each time the function is called, at least once for each file). Or you can declare $getID3 as a global variable in the function which allows it to be used the way you have it already:

Code: Select all

function recursiveScan($dirname, $maxRecursion=FALSE) {
  global $getID3;
// rest of your function
http://www.php.net/language.variables.scope.php

Post Reply