Flea's Forums

Full Version: Performing a recursive directory search in PHP.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Because I'm an object oriented kind of guy, I'll show you how to not only make a recursive directory search but how to make an object that does it for you upon creation. This tutorial requires PHP4/5 and that is all; surprised?

All right, so lets create an initial framework. My explanations will be in the comments, and will describe the usage (or recommendation that they not be used outside the script) and then we'll get into the guts of the script.
PHP Code:
<?php

  
// This is how you start a class.  You can say class x extends y, but we're not doing that yet.
  
class RecursiveSearch {
  
    
// Now, you can declare your variables here, but it is not good form to initialize them here.
    // The reason for this is that functions outside a child function (for initialization) aren't
    // called.  That's right.  So if I wrote:
    //  var $abc = dirname(__FILE__);
    // It would not get initialized properly.  Let us begin in earnest.

    // This is the root directory of our search, it should always be an absolute path, and not a
    // relative path.
    
var $basedir '';
    
// This will be an array, when our object is created, of all results in a path relative to
    // the base dir.
    
var $files;
    
// This is an array of our directories, just in case someone happens to want those sometime. ;-)
    
var $folders;

    
// Now, this is different.  I use a double underscore for internal variables and this is a
    // callback function for any action the user may want to execute on a per-file basis.
    // The function will be called like this: call_user_func_array($__filefunc, $file)
    // A function prototype should be: find_file($file){}
    
var $__filefunc;

    
// This is our framework constructor.  All we need passed to it is the root directory to search,
    // and an optional per-file callback.  Because we don't want to supply the latter every time
    // we'll make it default to ''.
    
function RecursiveSearch($root,$callback '')
    {
    }
    
    
// This is our prototype internal search function.  You shouldn't call this on your own.  The
    // dir string that gets passed is what lets us go recursive in our search.  Only there are two
    // problems which I will outline later.
    
function __search($dir '')
    {
    }
    
    
// I won't explain why this is here yet, but it is crucial if you don't like infinite loops.
    
function __isdot($s)
    {
    }
    
  }

?>

Now that means that you can simply call your class like this:
PHP Code:
$search = new RecursiveSearch(dirname(__FILE__)."/inc/plugins");

  
$output "<b>Search Results:</b><ul>\n"
  
foreach($search->files as $file)
  {
    
// The .= operator is the same as: $var = $var."Hi!";
    
$output .= "<li>$file</li>\n"
  
}
  
$output .= "</ul>"

  
echo $output
In fact, we'll come back to this a little later and show how you can do all sorts of things to make yourself acquainted with its usage.

Now, back to the object on a per-function basis, and making it function like a real wonder.

Our constructor really needs some serious work; without initializing variables and starting our search, there's just no way for our object to work. So step one is to initialize the variables. Remember, a class/object variable is referenced inside the object's code as [iphp]$this->varname;[/iphp], so all initialization must use this convention and all internal function calls too.
PHP Code:
// This is our framework constructor.  All we need passed to it is the root directory to
    // search, and an optional per-file callback.  Because we don't want to supply the latter
    // every time we'll make it default to ''.
    
function RecursiveSearch($root,$callback '')
    {
      
$this->__filefunc $callback// We want this assigned even if blank.  More later!
      
$this->basedir    $root;
      
$this->files      = array();
      
$this->folders    = array();

      
// This is how hard it is to initialize the object. Wow huh? In fact, more on this later.
      
$this->__search(); 
    } 

Now, before we tackle the __search function lets fix up the __isdot function because it is very important.
PHP Code:
// I won't explain why this is here yet, but it is crucial if you don't like infinite loops.
    
function __isdot($s)
    {
      return (
$s == '.' || $s == '..');
    } 
Yes, that's it. You might be thinking that I'm nuts, but stop and think for a minute, okay? What are the first two results always found on both Windows and Linux computers? Allow me to show you:
Quote:Microsoft Windows XP [Version 5.1.2600]
© Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\{...}>dir
Volume in drive C has no label.
Volume Serial Number is {...}

Directory of C:\Documents and Settings\{...}

04/05/2007 10:55 PM <DIR> .
04/05/2007 10:55 PM <DIR> ..
02/21/2007 03:26 PM <DIR> .borland
01/15/2007 04:12 PM <DIR> .musikproject
02/26/2007 02:47 PM <DIR> .scribus
01/29/2007 02:55 PM <DIR> AbiSuite
04/05/2007 03:10 PM <DIR> Desktop
04/05/2007 09:04 PM <DIR> My Documents
04/05/2007 10:55 PM 4,980,736 NTUSER.DAT
02/02/2007 02:27 PM 127 quotes.txt
09/06/2006 12:40 AM 7,052 reg736.txt
01/16/2007 05:15 PM <DIR> Start Menu
03/06/2007 07:05 PM <DIR> WINDOWS
3 File(s) 4,987,915 bytes
10 Dir(s) 127,047,512,064 bytes free

C:\Documents and Settings\{...}>
Sometimes people ignore these conventions, but . is the same directory as the current directory, and .. is the one below your current location. So if we recursively search without looking for these we get one of two problems:
1. We search the same directory infinitely.
2. We search both downwards and upwards through the entire computer's structure.

Yikes! Think about it for a second. You are searching in C:\Temp\htdocs, so the first result is .. Well, lets say you skip it and go to the second, .., dot result? Because of how we will prioritize things you will search C:\Temp long before you ever so much as touch a given file in C:\Temp\htdocs! And, worse still, you would then search not only the root directory/drive, but all of its given pathways in all directions.

Ugly? Yeah. That's why if your script ever locks up or starts turning out pages upon pages of irrelevant results your first check should be if you called __isdot(), and then if you handled it right.

Enough doom and gloom, lets move onto the real meat and potatoes of the search: __search($dir)!

PHP Code:
// This is our prototype internal search function.  You shouldn't call this on your own.  The
    // dir string that gets passed is what lets us go recursive in our search.  Only there are two
    // problems which I will outline later.
    
function __search($dir '')
    {
      
// This is the same as if($dir == '') do ? ... : else do : ... ;  This tutorial is to explain
      // classes in a basic sense, and how to do a recursive search in a stable one. ;) Not
      // elementary PHP coding.
      
$path $dir == '' $this->basedir "{$this->basedir}/$dir";
      
      foreach(
scandir($path) as $found)
      {
        
// Now, this is extremely critical, as the __isdot call must be before everything else, or
        // else it *will* register as a valid directory to be searched!
        
if(!$this->__isdot($found))
        {
          
$absolute "$path/$found";
          
$relative $dir == '' $found "$dir/$found";
          
// We prioritize folders first, as this script dives to the deepest depth and then works
          // outwards.  It's an effective mechanism to ensure that you do end up getting the 
          // results in a rather efficient manner.
          
if(is_dir($absolute))
          {
            
$this->folders[] = $relative// Store the result... again, with relative pathing.

            // And this is how you search recursively. :D Just call it with the relative path, and
            // you're good to go!
            
$this->__search($relative);
          }elseif(
is_file($absolute)){
            
$this->files[] = $relative;

            
// And this is how we add a callback hook, so that if there is a function to call
            // whenever a file is found this is it!  Pretty effective and very easy to handle
            // I must say.
            
if($this->__filefunc != '')
              
call_user_func_array($this->__filefunc$relative);
          }
        }
      }
    } 

Done. But there's gotta be more to it! Really? Recursive searches are deceptively simple, and before you fool yourself make sure that you know, and I mean know, that there really isn't more to it. Sure, you could have a
PHP Code:
$this->count count($this->files)+count($this->folders); 
Property and such, but it really isn't essential.

Our final sourcecode:
PHP Code:
<?php

  
// This is how you start a class.  You can say class x extends y, but we're not doing that yet.
  
class RecursiveSearch {
  
    
// Now, you can declare your variables here, but it is not good form to initialize them here.
    // The reason for this is that functions outside a child function (for initialization) aren't
    // called.  That's right.  So if I wrote:
    //  var $abc = dirname(__FILE__);
    // It would not get initialized properly.  Let us begin in earnest.

    // This is the root directory of our search, it should always be an absolute path, and not a relative path.
    
var $basedir '';
    
// This will be an array, when our object is created, of all results in a path relative to the base dir.
    
var $files;
    
// This is an array of our directories, just in case someone happens to want those sometime. ;-)
    
var $folders;
    var 
$count// Just for kicks.

    // Now, this is different.  I use a double underscore for internal variables and this is a callback
    // function for any action the user may want to execute on a per-file basis.
    // The function will be called like this: call_user_func_array($__filefunc, $file)
    // A function prototype should be: find_file($file){}
    
var $__filefunc;

    
// This is our framework constructor.  All we need passed to it is the root directory to search, and an optional
    // per-file callback.  Because we don't want to supply the latter every time we'll make it default to ''.
    
function RecursiveSearch($root,$callback '')
    {
      
$this->__filefunc $callback// We want this assigned even if blank.  More later!
      
$this->basedir    $root;
      
$this->files      = array();
      
$this->folders    = array();
      
      
$this->__search(); // This is how hard it is to initialize the object.  Wow huh?  In fact, more on this later.
      // The following line is not executed until after the entire search is finished.
      
$this->count count($this->files)+count($this->folders);
    }
    
    
// This is our prototype internal search function.  You shouldn't call this on your own.  The dir string that gets
    // passed is what lets us go recursive in our search.  Only there are two problems which I will outline later.
    
function __search($dir '')
    {
      
// This is the same as if($dir == '') do ? ... : else do : ... ;  This tutorial is to explain
      // classes in a basic sense, and how to do a recursive search in a stable one. ;) Not elementary
      // PHP coding.
      
$path $dir == '' $this->basedir "{$this->basedir}/$dir";
      
      foreach(
scandir($path) as $found)
      {
        
// Now, this is extremely critical, as the __isdot call must be before everything else, or it *will*
        // register as a valid directory to be searched!
        
if(!$this->__isdot($found))
        {
          
$absolute "$path/$found";
          
$relative $dir == '' $found "$dir/$found";
          
// We prioritize folders first, as this script dives to the deepest depth and then works outwards.  It's an
          // effective mechanism to ensure that you do end up getting the results in a rather efficient manner.
          
if(is_dir($absolute))
          {
            
$this->folders[] = $relative// Store the result... again, with relative pathing.

            // And this is how you search recursively. :D Just call it with the relative path, and you're good to go!
            
$this->__search($relative);
          }elseif(
is_file($absolute)){
            
$this->files[] = $relative;

            
// And this is how we add a callback hook, so that if there is a function to call whenever a file is found
            // this is it!  Pretty effective and very easy to handle I must say.
            
if($this->__filefunc != '')
              
call_user_func_array($this->__filefunc$relative);
          }
        }
      }
    }
    
    function 
__isdot($s)
    {
      return (
$s == '.' || $s == '..');
    }
  }

  
/*
  
    The following is a test script; be careful with how you mangle it. :P

  */

  
echo "<b>Files:</b>\n<ul>\n";
  
$search = new RecursiveSearch(dirname(__FILE__),create_function('$found','echo "<li>$found</li>\n";'));
  echo 
"</ul>\n",
       
"<table border='0'>\n",
       
"<tr><td><i>Number of Files:</i></td><td>".count($search->files)."</td></tr>\n",
       
"<tr><td><i>Total Results:</i></td><td>{$search->count}</td></tr>\n",
       
"</table>";

?>

That's all folks! And yes, you can make an anonymous functions as your callback, but sometimes that isn't what you want. You might want to reference a global object or do something that is messy in a string. In any case, it works for me ... just drop it into your own webserver's root directory (assuming that you delete it immediately afterwards, since you don't want folks knowing all the layout of your files) and try it for yourself. Grin
I've tried this class and it works prety good.
The only problem is, I'm not that kind of programmer to know how it really works and yet having a slight problem to fix.

I've managed to get the class working on a folder-structure containing over 200.000 entries.
They all go into the arrays with the scandir function.

Basically I end up reading that array returned from this class in my script performing another search in this array finding specific matches of the searchword entered on the webpage.

If I was able to perform that searchaction inside the class, it would only fill the array with scandir's result matching my searchword.
Makes it efficient and will spare me from time-out problems.

How would I integrate this into this class properly?

Erik
Hello Erik, thanks for the question. Sorry for the delay in responding, but as you can tell it isn't all that active around here.

My recommendation to you is that you derive a new class based off mine, and then override the __search() function. I would suggest that you add a keyword property to the constructor, as a property, and then change the code for including files into the results part of the __search() function like so:
Code:
<?php

  class KeywordSearch extends RecursiveSearch {
    var $keyword;

    function KeywordSearch($root,$keyword,$callback = '')
    {
      $this->keyword = $keyword;
      // Call our parent constructor:
      parent::RecursiveSearch($root,$callback);
    }
    
    function __search($dir = '')
    {
      $path = $dir == '' ? $this->basedir : "{$this->basedir}/$dir";
      
      foreach(scandir($path) as $found)
      {
        if(!$this->__isdot($found))
        {
          $absolute = "$path/$found";
          $relative = $dir == '' ? $found : "$dir/$found";
          if(is_dir($absolute))
          {
            $this->folders[] = $relative; // Store the result... again, with relative pathing.

            $this->__search($relative);
          // We want to ensure that not only is our given item is a file, but
          // that it contains the keyword as well.  Strstr works well for this.
          }elseif(is_file($absolute) && (strstr($absolute, $this->keyword) != false)){
            $this->files[] = $relative;

            if($this->__filefunc != '')
              call_user_func_array($this->__filefunc, $relative);
          }
        }
      }
    }
  }
  
?>

You can paste this code below the RecursiveSearch class and use it directly. Unless I've made a typo, this should work though I haven't tested it.

Let me know of any trouble. Smile
Awesome!
Now I see what enheriting means using classes!

Thanks for the reply.
You're very welcome. Smile Inheritance is an extremely powerful side of programming, isn't it?
... What?

Now I feel stupid.....
You're not stupid, you just don't know anything about programming is all. Wink This is a tutorial on how you search directories recursively, meaning that you search subdirectories too, and it presumes prior programming knowledge in PHP.
Reference URL's