Tuesday, July 14, 2009

Beginning PHP and Oracle From Novice to Professional by W. Jason Gilmore and Bob Bryla Chapter 10

It’s quite rare to write an application that is entirely self-sufficient—that is, a program that does not rely on at least some level of interaction with external resources, such as the underlying file and operating
system, and even other programming languages. The reason for this is simple: as languages, file systems, and operating systems mature, the opportunities for creating much more efficient, scalable, and timely applications increases greatly as a result of the developer’s ability to integrate the tried-and- true features of each component into a singular product. Of course, the trick is to choose a language that offers a convenient and efficient means for doing so. Fortunately, PHP satisfies both conditions quite nicely, offering the programmer a wonderful array of tools not only for handling file system input and output, but also for executing programs at the shell level. This chapter serves as an intro- duction to these features, describing how to work with the following:

• Files and directories: You’ll learn how to perform file system forensics, revealing details such as file and directory size and location, modification and access times, and more.
• File I/O: You’ll learn how to interact with data files, which will let you perform a variety of practical tasks, including creating, deleting, reading, and writing files.
• Directory contents: You’ll learn how to easily retrieve directory contents.

• Shell commands: You can take advantage of operating system and other language-level functionality from within a PHP application through a number of built-in functions and mechanisms.
• Sanitizing input: Although Chapter 21 goes into this topic in further detail, this chapter demonstrates some of PHP’s input sanitization capabilities, showing you how to prevent users from passing data that could potentially cause harm to your data and operating system.

■Note PHP is particularly adept at working with the underlying file system, so much so that it is gaining popularity
as a command-line interpreter, a capability introduced in version 4.2.0. This topic is beyond the scope of this book, but you can find additional information in the PHP manual.

Learning About Files and Directories

Organizing related data into entities commonly referred to as files and directories has long been a core concept in the computing environment. For this reason, programmers need to have a means for obtaining important details about files and directories, such as location, size, last modification

195

time, last access time, and other defining information. This section introduces many of PHP’s built- in functions for obtaining these important details.

Parsing Directory Paths

It’s often useful to parse directory paths for various attributes such as the tailing extension name, directory component, and base name. Several functions are available for performing such tasks, all of which are introduced in this section.

Retrieving a Path’s Filename

The basename() function returns the filename component of a path. Its prototype follows:

string basename(string path [, string suffix])

If the optional suffix parameter is supplied, that suffix will be omitted if the returned file name contains that extension. An example follows:

<?php
$path = "/home/www/data/users.txt";
printf("Filename: %s <br />", basename($path));
printf("Filename sans extension: %s <br />", basename($path, ".txt"));
?>

Executing this example produces the following:

Filename: users.txt
Filename sans extension: users

Retrieving a Path’s Directory

The dirname() function is essentially the counterpart to basename(), providing the directory compo- nent of a path. Its prototype follows:

string dirname(string path)

The following code will retrieve the path leading up to the file name users.txt:

<?php
$path = "/home/www/data/users.txt";
printf("Directory path: %s", dirname($path));
?>

This returns the following:

Directory path: /home/www/data

Learning More About a Path

The pathinfo() function creates an associative array containing three components of a path, namely the directory name, the base name, and the extension. Its prototype follows:

array pathinfo(string path)

Consider the following path:

/home/www/htdocs/book/chapter10/index.html

As is relevant to pathinfo(), this path contains three components:

• Directory name: /home/www/htdocs/book/chapter10

• Base name: index.html

• File extension: html

Therefore, you can use pathinfo() like this to retrieve this information:

<?php
$pathinfo = pathinfo("/home/www/htdocs/book/chapter10/index.html");
printf("Dir name: %s <br />", $pathinfo[dirname]); printf("Base name: %s <br />", $pathinfo[basename]); printf("Extension: %s <br />", $pathinfo[extension]);
?>

This returns the following:

Dir name: /home/www/htdocs/book/chapter10
Base name: index.html
Extension: html

Identifying the Absolute Path

The realpath() function converts all symbolic links and relative path references located in path to their absolute counterparts. Its prototype follows:

string realpath(string path)

For example, suppose your directory structure assumes the following path:

/home/www/htdocs/book/images/

You can use realpath() to resolve any local path references:

<?php
$imgPath = "../../images/cover.gif";
$absolutePath = realpath($imgPath);
// Returns /www/htdocs/book/images/cover.gif
?>

Calculating File, Directory, and Disk Sizes

Calculating file, directory, and disk sizes is a common task in all sorts of applications. This section introduces a number of standard PHP functions suited to this task.

Determining a File’s Size

The filesize() function returns the size, in bytes, of a specified file. Its prototype follows:

int filesize(string filename)

An example follows:

<?php
$file = "/www/htdocs/book/chapter1.pdf";
$bytes = filesize($file);
$kilobytes = round($bytes/1024, 2);
printf("File %s is $bytes bytes, or %.2f kilobytes", basename($file), $kilobytes);
?>

This returns the following:

File chapter1.pdf is 91815 bytes, or 89.66 kilobytes

Calculating a Disk’s Free Space

The function disk_free_space() returns the available space, in bytes, allocated to the disk partition housing a specified directory. Its prototype follows:

float disk_free_space(string directory)

An example follows:

<?php
$drive = "/usr";
printf("Remaining MB on %s: %.2f", $drive, round((disk_free_space($drive) / 1048576), 2));
?>

This returns the following:

Remaining MB on /usr: 2141.29

Note that the returned number is in megabytes (MB) because the value returned from disk_
free_space() is divided by 1,048,576, which is equivalent to 1MB.

Calculating Total Disk Size

The disk_total_space() function returns the total size, in bytes, consumed by the disk partition housing a specified directory. Its prototype follows:

float disk_total_space(string directory)

If you use this function in conjunction with disk_free_space(), it’s easy to offer useful space allocation statistics:

<?php

$partition = "/usr";

// Determine total partition space
$totalSpace = disk_total_space($partition) / 1048576;

// Determine used partition space
$usedSpace = $totalSpace - disk_free_space($partition) / 1048576;

printf("Partition: %s (Allocated: %.2f MB. Used: %.2f MB.)",
$partition, $totalSpace, $usedSpace);
?>

This returns the following:

Partition: /usr (Allocated: 36716.00 MB. Used: 32327.61 MB.)

Retrieving a Directory Size

PHP doesn’t currently offer a standard function for retrieving the total size of a directory, a task more often required than retrieving total disk space (see disk_total_space() in the previous section). And although you could make a system-level call to du using exec() or system() (both of which are intro- duced in the later section “PHP’s Program Execution Functions”), such functions are often disabled for security reasons. The alternative solution is to write a custom PHP function that is capable of carrying out this task. A recursive function seems particularly well-suited for this task. One possible
variation is offered in Listing 10-1.

■Note The du command will summarize disk usage of a file or a directory. See the appropriate man page for
usage information.

Listing 10-1. Determining the Size of a Directory’s Contents

<?php
function directory_size($directory) {

$directorySize=0;

// Open the directory and read its contents. if ($dh = @opendir($directory)) {

// Iterate through each directory entry. while (($filename = readdir ($dh))) {

// Filter out some of the unwanted directory entries. if ($filename != "." && $filename != "..")
{

// File, so determine size and add to total. if (is_file($directory."/".$filename))
$directorySize += filesize($directory."/".$filename);

// New directory, so initiate recursion. */
if (is_dir($directory."/".$filename))
$directorySize += directory_size($directory."/".$filename);
}
}
}

@closedir($dh);
return $directorySize;

} #end directory_size()

$directory = "/usr/book/chapter10/";
$totalSize = round((directory_size($directory) / 1048576), 2);
printf("Directory %s: %f MB", $directory: ".$totalSize);

?>

Executing this script will produce output similar to the following:

Directory /usr/book/chapter10/: 2.12 MB

Determining Access and Modification Times

The ability to determine a file’s last access and modification time plays an important role in many administrative tasks, especially in Web applications that involve network or CPU-intensive update operations. PHP offers three functions for determining a file’s access, creation, and last modification time, all of which are introduced in this section.

Determining a File’s Last Access Time

The fileatime() function returns a file’s last access time in Unix timestamp format, or FALSE on error. Its prototype follows:

int fileatime(string filename)

An example follows:

<?php
$file = "/usr/local/apache2/htdocs/book/chapter10/stat.php";
printf("File last accessed: %s", date("m-d-y g:i:sa", fileatime($file)));
?>

This returns the following:

File last accessed: 06-09-03 1:26:14pm

Determining a File’s Last Changed Time

The filectime() function returns a file’s last changed time in Unix timestamp format, or FALSE on error. Its prototype follows:

int filectime(string filename)

An example follows:

<?php
$file = "/usr/local/apache2/htdocs/book/chapter10/stat.php";
printf("File inode last changed: %s", date("m-d-y g:i:sa", fileatime($file)));
?>

This returns the following:

File inode last changed: 06-09-03 1:26:14pm

■Note The last changed time differs from the last modified time in that the last changed time refers to any
change in the file’s inode data, including changes to permissions, owner, group, or other inode-specific information, whereas the last modified time refers to changes to the file’s content (specifically, byte size).

Determining a File’s Last Modified Time

The filemtime() function returns a file’s last modification time in Unix timestamp format, or FALSE
otherwise. Its prototype follows:

int filemtime(string filename)

The following code demonstrates how to place a “last modified” timestamp on a Web page:

<?php
$file = "/usr/local/apache2/htdocs/book/chapter10/stat.php";
echo "File last updated: ".date("m-d-y g:i:sa", filemtime($file));
?>

This returns the following:

File last updated: 06-09-03 1:26:14pm

Working with Files

Web applications are rarely 100 percent self-contained; that is, most rely on some sort of external data source to do anything interesting. Two prime examples of such data sources are files and databases. In this section you’ll learn how to interact with files by way of an introduction to PHP’s numerous standard file-related functions. But first it’s worth introducing a few basic concepts pertinent to this topic.

The Concept of a Resource

The term resource is commonly used to refer to any entity from which an input or output stream can be initiated. Standard input or output, files, and network sockets are all examples of resources. Therefore you’ll often see many of the functions introduced in this section discussed in the context of resource handling, rather than file handling, per se, because all are capable of working with resources such as the aforementioned. However, because their use in conjunction with files is the most common application, the discussion will primarily be limited to that purpose; although the terms resource and file may be used interchangeably throughout.

Recognizing Newline Characters

The newline character, which is represented by the \n character sequence (\r\n on Windows), repre- sents the end of a line within a file. Keep this in mind when you need to input or output information one line at a time. Several functions introduced throughout the remainder of this chapter offer func- tionality tailored to working with the newline character. Some of these functions include file(), fgetcsv(), and fgets().

Recognizing the End-of-File Character

Programs require a standardized means for discerning when the end of a file has been reached. This standard is commonly referred to as the end-of-file, or EOF, character. This is such an important concept that almost every mainstream programming language offers a built-in function for verifying whether the parser has arrived at the EOF. In the case of PHP, this function is feof(). The feof() function determines whether a resource’s EOF has been reached. It is used quite commonly in file I/O operations. Its prototype follows:

int feof(string resource)

An example follows:

<?php
// Open a text file for reading purposes
$fh = fopen("/home/www/data/users.txt", "rt");

// While the end-of-file hasn't been reached, retrieve the next line while (!feof($fh)) echo fgets($fh);

// Close the file fclose($fh);
?>

Opening and Closing a File

Typically you’ll need to create what’s known as a handle before you can do anything with its contents. Likewise, once you’ve finished working with that resource, you should destroy the handle. Two standard functions are available for such tasks, both of which are introduced in this section.

Opening a File

The fopen() function binds a file to a handle. Once bound, the script can interact with this file via the handle. Its prototype follows:

resource fopen(string resource, string mode [, int use_include_path
[, resource zcontext]])

While fopen() is most commonly used to open files for reading and manipulation, it’s also capable of opening resources via a number of protocols, including HTTP, HTTPS, and FTP, a concept discussed in Chapter 16.
The mode, assigned at the time a resource is opened, determines the level of access available to that resource. The various modes are defined in Table 10-1.
If the resource is found on the local file system, PHP expects it to be available by the path pref- acing it. Alternatively, you can assign fopen()’s use_include_path parameter the value of 1, which will cause PHP to look for the resource within the paths specified by the include_path configuration directive.

Table 10-1. File Modes

Mode Description

r Read-only. The file pointer is placed at the beginning of the file.

r+ Read and write. The file pointer is placed at the beginning of the file.

w Write only. Before writing, delete the file contents and return the file pointer to the beginning of the file. If the file does not exist, attempt to create it.

w+ Read and write. Before reading or writing, delete the file contents and return the file pointer to the beginning of the file. If the file does not exist, attempt to create it.

a Write only. The file pointer is placed at the end of the file. If the file does not exist, attempt to create it. This mode is better known as Append.

a+ Read and write. The file pointer is placed at the end of the file. If the file does not exist, attempt to create it. This process is known as appending to the file.

b Open the file in binary mode.

t Open the file in text mode.

The final parameter, zcontext, is used for setting configuration parameters specific to the file or stream and for sharing file- or stream-specific information across multiple fopen() requests. This topic is discussed in further detail in Chapter 16.
Let’s consider a few examples. The first opens a read-only handle to a text file residing on the local server:

$fh = fopen("/usr/local/apache/data/users.txt","rt");

The next example demonstrates opening a write handle to an HTML document:

$fh = fopen("/usr/local/apache/data/docs/summary.html","w");

The next example refers to the same HTML document, except this time PHP will search for the file in the paths specified by the include_path directive (presuming the summary.html document resides in the location specified in the previous example, include_path will need to include the path
/usr/local/apache/data/docs/):

$fh = fopen("summary.html","w", 1);

The final example opens a read-only stream to a remote index.html file:

$fh = fopen("http://www.example.com/", "r");

Of course, keep in mind fopen() only readies the resource for an impending operation. Other than establishing the handle, it does nothing; you’ll need to use other functions to actually perform the read and write operations. These functions are introduced in the sections that follow.

Closing a File

Good programming practice dictates that you should destroy pointers to any resources once you’re finished with them. The fclose() function handles this for you, closing the previously opened file pointer specified by a file handle, returning TRUE on success and FALSE otherwise. Its prototype follows:

boolean fclose(resource filehandle)

The filehandle must be an existing file pointer opened using fopen() or fsockopen().

Reading from a File

PHP offers numerous methods for reading data from a file, ranging from reading in just one character at a time to reading in the entire file with a single operation. Many of the most useful functions are introduced in this section.

Reading a File into an Array

The file() function is capable of reading a file into an array, separating each element by the newline character, with the newline still attached to the end of each element. Its prototype follows:

array file(string filename [int use_include_path [, resource context]])

Although simplistic, the importance of this function can’t be overstated, and therefore it warrants a simple demonstration. Consider the following sample text file named users.txt:

Ale ale@example.com
Nicole nicole@example.com
Laura laura@example.com

The following script reads in users.txt and parses and converts the data into a convenient Web- based format. Notice file() provides special behavior because unlike other read/write functions, you don’t have to establish a file handle in order to read it:
<?php

// Read the file into an array
$users = file("users.txt");

// Cycle through the array foreach ($users as $user) {

// Parse the line, retrieving the name and e-mail address list($name, $email) = explode(" ", $user);

// Remove newline from $email
$email = trim($email);

// Output the formatted name and e-mail address
echo "<a href=\"mailto:$email\">$name</a> <br /> ";

}

?>

This script produces the following HTML output:

<a href="ale@example.com">Ale</a><br />
<a href="nicole@example.com">Nicole</a><br />
<a href="laura@example.com">Laura</a><br />

Like fopen(), you can tell file() to search through the paths specified in the include_path configuration parameter by setting use_include_path to 1. The context parameter refers to a stream context. You’ll learn more about this topic in Chapter 16.

Reading File Contents into a String Variable

The file_get_contents() function reads the contents of a file into a string. Its prototype follows:

string file_get_contents(string filename [, int use_include_path
[resource context]])

By revising the script from the preceding section to use this function instead of file(), you get the following code:

<?php

// Read the file into a string variable
$userfile= file_get_contents("users.txt");

// Place each line of $userfile into array
$users = explode("\n",$userfile);

// Cycle through the array foreach ($users as $user) {

// Parse the line, retrieving the name and e-mail address list($name, $email) = explode(" ", $user);

// Output the formatted name and e-mail address echo "<a href=\"mailto:$email\">$name/a> <br />";

}

?>

The use_include_path and context parameters operate in a manner identical to those defined
in the preceding section.

Reading a CSV File into an Array

The convenient fgetcsv() function parses each line of a file marked up in CSV format. Its prototype follows:
array fgetcsv(resource handle [, int length [, string delimiter
[, string enclosure]]])

Reading does not stop on a newline; rather, it stops when length characters have been read. As of PHP 5, omitting length or setting it to 0 will result in an unlimited line length; however, since this degrades performance it is always a good idea to choose a number that will certainly surpass the longest line in the file. The optional delimiter parameter (by default set to a comma) identifies the character used to delimit each field. The optional enclosure parameter (by default set to a double quote) identifies a character used to enclose field values, which is useful when the assigned delimiter value might also appear within the field value, albeit under a different context.

■Note Comma-separated value (CSV) files are commonly used when importing files between applications. Microsoft
Excel and Access, MySQL, Oracle, and PostgreSQL are just a few of the applications and databases capable of both importing and exporting CSV data. Additionally, languages such as Perl, Python, and PHP are particularly efficient at parsing delimited data.

Consider a scenario in which weekly newsletter subscriber data is cached to a file for perusal by the marketing staff. This file might look like this:

Jason Gilmore,jason@example.com,614-555-1234
Bob Newhart,bob@example.com,510-555-9999
Carlene Ribhurt,carlene@example.com,216-555-0987

Always eager to barrage the IT department with dubious requests, the marketing staff asks that the information also be made available for viewing on the Web. Thankfully, this is easily accomplished with fgetcsv(). The following example parses the file:

<?php

// Open the subscribers data file
$fh = fopen("/home/www/data/subscribers.csv", "r");

// Break each line of the file into three parts
while (list($name, $email, $phone) = fgetcsv($fh, 1024, ",")) {

// Output the data in HTML format
printf("<p>%s (%s) Tel. %s</p>", $name, $email, $phone);
}

?>

Note that you don’t have to use fgetcsv() to parse such files; the file() and list() functions
accomplish the job quite nicely. Reconsider the preceding example:

<?php

// Read the file into an array
$users = file("/home/www/data/subscribers.csv");

foreach ($users as $user) {

// Break each line of the file into three parts list($name, $email, $phone) = explode(",", $user);

// Output the data in HTML format
printf("<p>%s (%s) Tel. %s</p>", $name, $email, $phone);

}

?>

Reading a Specific Number of Characters

The fgets() function returns a certain number of characters read in through the opened resource handle, or everything it has read up to the point when a newline or an EOF character is encountered. Its prototype follows:

string fgets(resource handle [, int length])

If the optional length parameter is omitted, 1,024 characters is assumed. In most situations, this means that fgets() will encounter a newline character before reading 1,024 characters, thereby returning the next line with each successive call. An example follows:

<?php
// Open a handle to users.txt
$fh = fopen("/home/www/data/users.txt", "rt");
// While the EOF isn't reached, read in another line and output it while (!feof($fh)) echo fgets($fh);

// Close the handle fclose($fh);
?>

Stripping Tags from Input

The fgetss() function operates similarly to fgets(), except that it also strips any HTML and PHP
tags from the input. Its prototype follows:

string fgetss(resource handle, int length [, string allowable_tags])

If you’d like certain tags to be ignored, include them in the allowable_tags parameter. As an example, consider a scenario in which contributors are expected to submit their work in HTML format using a specified subset of HTML tags. Of course, the authors don’t always follow instructions, so the file must be filtered for tag misuse before it can be published. With fgetss(), this is trivial:

<?php

// Build list of acceptable tags
$tags = "<h2><h3><p><b><a><img>";

// Open the article, and read its contents.
$fh = fopen("article.html", "rt");

while (!feof($fh)) {
$article .= fgetss($fh, 1024, $tags);
}

// Close the handle fclose($fh);

// Open the file up in write mode and output its contents.
$fh = fopen("article.html", "wt");
fwrite($fh, $article);

// Close the handle fclose($fh);

?>

■Tip If you want to remove HTML tags from user input submitted via a form, check out the strip_tags() function,
introduced in Chapter 9.

Reading a File One Character at a Time

The fgetc() function reads a single character from the open resource stream specified by handle. If the EOF is encountered, a value of FALSE is returned. It’s prototype follows:

string fgetc(resource handle)

Ignoring Newline Characters

The fread() function reads length characters from the resource specified by handle. Reading stops when the EOF is reached or when length characters have been read. Its prototype follows:

string fread(resource handle, int length)

Note that unlike other read functions, newline characters are irrelevant when using fread(); therefore, it’s often convenient to read the entire file in at once using filesize() to determine the number of characters that should be read in:

<?php

$file = "/home/www/data/users.txt";

// Open the file for reading
$fh = fopen($file, "rt");

// Read in the entire file
$userdata = fread($fh, filesize($file));

// Close the file handle fclose($fh);

?>

The variable $userdata now contains the contents of the users.txt file.

Reading in an Entire File

The readfile() function reads an entire file specified by filename and immediately outputs it to the output buffer, returning the number of bytes read. Its prototype follows:

int readfile(string filename [, int use_include_path])

Enabling the optional use_include_path parameter tells PHP to search the paths specified by the include_path configuration parameter. This function is useful if you’re interested in simply dumping an entire file to the browser:

<?php

$file = "/home/www/articles/gilmore.html";

// Output the article to the browser.
$bytes = readfile($file);

?>

Like many of PHP’s other file I/O functions, remote files can be opened via their URL if the
configuration parameter fopen_wrappers is enabled.

Reading a File According to a Predefined Format

The fscanf() function offers a convenient means for parsing a resource in accordance with a predefined format. Its prototype follows:

mixed fscanf(resource handle, string format [, string var1])

For example, suppose you want to parse the following file consisting of Social Security numbers
(SSN) (socsecurity.txt):

123-45-6789
234-56-7890
345-67-8901

The following example parses the socsecurity.txt file:

<?php

$fh = fopen("socsecurity.txt", "r");
// Parse each SSN in accordance with integer-integer-integer format while ($user = fscanf($fh, "%d-%d-%d")) {

// Assign each SSN part to an appropriate variable list ($part1,$part2,$part3) = $user;
printf(Part 1: %d Part 2: %d Part 3: %d <br />", $part1, $part2, $part3);
}

fclose($fh);

?>

With each iteration, the variables $part1, $part2, and $part3 are assigned the three components
of each SSN, respectively, and output to the browser.

Writing a String to a File

The fwrite() function outputs the contents of a string variable to the specified resource. Its proto- type follows:

int fwrite(resource handle, string string [, int length])

If the optional length parameter is provided, fwrite() will stop writing when length characters have been written. Otherwise, writing will stop when the end of the string is found. Consider this example:
<?php

// Data we'd like to write to the subscribers.txt file
$subscriberInfo = "Jason Gilmore|jason@example.com";

// Open subscribers.txt for writing
$fh = fopen("/home/www/data/subscribers.txt", "at");

// Write the data
fwrite($fh, $subscriberInfo);

// Close the handle fclose($fh);

?>

■Tip If the optional length parameter is not supplied to fwrite(), the magic_quotes_runtime configura- tion parameter will be disregarded. See Chapters 2 and 9 for more information about this parameter. This only applies to PHP 5 and earlier.

Moving the File Pointer

It’s often useful to jump around within a file, reading from and writing to various locations. Several
PHP functions are available for doing just this.

Moving the File Pointer to a Specific Offset

The fseek() function moves the pointer to the location specified by a provided offset value. Its prototype follows:

int fseek(resource handle, int offset [, int whence])

If the optional parameter whence is omitted, the position is set offset bytes from the beginning of the file. Otherwise, whence can be set to one of three possible values, which affect the pointer’s position:

SEEK_CUR: Sets the pointer position to the current position plus offset bytes.

SEEK_END: Sets the pointer position to the EOF plus offset bytes. In this case, offset must be set to a negative value.

SEEK_SET: Sets the pointer position to offset bytes. This has the same effect as omitting whence.

Retrieving the Current Pointer Offset

The ftell() function retrieves the current position of the file pointer’s offset within the resource. Its prototype follows:

int ftell(resource handle)

Moving the File Pointer Back to the Beginning of the File

The rewind() function moves the file pointer back to the beginning of the resource. Its prototype follows:

int rewind(resource handle)

Reading Directory Contents

The process required for reading a directory’s contents is quite similar to that involved in reading a file. This section introduces the functions available for this task and also introduces a function new to PHP 5 that reads a directory’s contents into an array.

Opening a Directory Handle

Just as fopen() opens a file pointer to a given file, opendir() opens a directory stream specified by a path. Its prototype follows:

resource opendir(string path)

Closing a Directory Handle

The closedir() function closes the directory stream. Its prototype follows:

void closedir(resource directory_handle)

Parsing Directory Contents

The readdir() function returns each element in the directory. Its prototype follows:

string readdir(int directory_handle)

Among other things, you can use this function to list all files and child directories in a given directory:

<?php
$dh = opendir('/usr/local/apache2/htdocs/');
while ($file = readdir($dh))
echo "$file <br />";
closedir($dh);
?>

Sample output follows:

.
.. articles images news test.php

Note that readdir() also returns the . and .. entries common to a typical Unix directory listing. You can easily filter these out with an if statement:

if($file != "." AND $file != "..")...

Reading a Directory into an Array

The scandir() function, introduced in PHP 5, returns an array consisting of files and directories found in directory, or returns FALSE on error. Its prototype follows:

array scandir(string directory [,int sorting_order [, resource context]])

Setting the optional sorting_order parameter to 1 sorts the contents in descending order, over- riding the default of ascending order. Executing this example (from the previous section)
<?php print_r(scandir("/usr/local/apache2/htdocs"));
?>

returns the following:

Array ( [0] => . [1] => .. [2] => articles [3] => images
[4] => news [5] => test.php )

The context parameter refers to a stream context. You’ll learn more about this topic in Chapter 16.

Executing Shell Commands

The ability to interact with the underlying operating system is a crucial feature of any programming language. Although you could conceivably execute any system-level command using a function such as exec() or system(), some of these functions are so commonplace that the PHP developers thought it a good idea to incorporate them directly into the language. Several such functions are introduced in this section.

Removing a Directory

The rmdir() function attempts to remove the specified directory, returning TRUE on success and
FALSE otherwise. Its prototype follows:

int rmdir(string dirname)

As with many of PHP’s file system functions, permissions must be properly set in order for rmdir() to successfully remove the directory. Because PHP scripts typically execute under the guise of the server daemon process owner, rmdir() will fail unless that user has write permissions to the directory. Also, the directory must be empty.
To remove a nonempty directory, you can either use a function capable of executing a system- level command, such as system() or exec(), or write a recursive function that will remove all file contents before attempting to remove the directory. Note that in either case, the executing user (server daemon process owner) requires write access to the parent of the target directory. Here is an example of the latter approach:

<?php
function delete_directory($dir)
{
if ($dh = opendir($dir))
{

// Iterate through directory contents while (($file = readdir ($dh)) != false)
{
if (($file == ".") || ($file == "..")) continue;
if (is_dir($dir . '/' . $file))
delete_directory($dir . '/' . $file);
else
unlink($dir . '/' . $file);
}

closedir($dh);
rmdir($dir);
}
}

$dir = "/usr/local/apache2/htdocs/book/chapter10/test/";
delete_directory($dir);
?>

Renaming a File

The rename() function renames a file, returning TRUE on success and FALSE otherwise. Its prototype follows:

boolean rename(string oldname, string newname)

Because PHP scripts typically execute under the guise of the server daemon process owner,
rename() will fail unless that user has write permissions to that file.

Touching a File

The touch() function sets the file filename’s last-modified and last-accessed times, returning TRUE
on success or FALSE on error. Its prototype follows:

int touch(string filename [, int time [, int atime]])

If time is not provided, the present time (as specified by the server) is used. If the optional atime parameter is provided, the access time will be set to this value; otherwise, like the modification time, it will be set to either time or the present server time.
Note that if filename does not exist, it will be created, assuming that the script’s owner possesses adequate permissions.

System-Level Program Execution

Truly lazy programmers know how to make the most of their entire server environment when devel- oping applications, which includes exploiting the functionality of the operating system, file system, installed program base, and programming languages whenever necessary. In this section, you’ll learn how PHP can interact with the operating system to call both OS-level programs and third-party installed applications. Done properly, it adds a whole new level of functionality to your PHP program- ming repertoire. Done poorly, it can be catastrophic not only to your application but also to your server’s data integrity. That said, before delving into this powerful feature, take a moment to consider the topic of sanitizing user input before passing it to the shell level.

Sanitizing the Input

Neglecting to sanitize user input that may subsequently be passed to system-level functions could allow attackers to do massive internal damage to your information store and operating system, deface or delete Web files, and otherwise gain unrestricted access to your server. And that’s only the beginning.

■Note See Chapter 21 for a discussion of secure PHP programming.

As an example of why sanitizing the input is so important, consider a real-world scenario. Suppose that you offer an online service that generates PDFs from an input URL. A great tool for accomplishing just this is the open source program HTMLDOC (http://www.htmldoc.org/), which converts HTML documents to indexed HTML, Adobe PostScript, and PDF files. HTMLDOC can be invoked from the command line, like so:

%>htmldoc --webpage –f webpage.pdf http://www.wjgilmore.com/

This would result in the creation of a PDF named webpage.pdf, which would contain a snapshot of the Web site’s index page. Of course, most users will not have command-line access to your server; therefore, you’ll need to create a much more controlled interface, such as a Web page. Using PHP’s passthru() function (introduced in the later section “PHP’s Program Execution Functions”), you can call HTMLDOC and return the desired PDF, like so:

$document = $_POST['userurl'];
passthru("htmldoc --webpage -f webpage.pdf $document);

What if an enterprising attacker took the liberty of passing through additional input, unrelated to the desired HTML page, entering something like this:

http://www.wjgilmore.com/ ; cd /usr/local/apache/htdocs/; rm –rf *

Most Unix shells would interpret the passthru() request as three separate commands. The first is this:

htmldoc --webpage -f webpage.pdf http://www.wjgilmore.com/

The second command is this:

cd /usr/local/apache/htdocs/

And the final command is this:

rm -rf *

The last two commands are certainly unexpected and could result in the deletion of your entire Web document tree. One way to safeguard against such attempts is to sanitize user input before it is passed to any of PHP’s program execution functions. Two standard functions are conveniently avail- able for doing so: escapeshellarg() and escapeshellcmd(). Each is introduced in this section.

Delimiting Input

The escapeshellarg() function delimits provided arguments with single quotes and prefixes (escapes)
quotes found within the input. Its prototype follows:

string escapeshellarg(string arguments)

The effect is that when arguments is passed to a shell command, it will be considered a single argument. This is significant because it lessens the possibility that an attacker could masquerade additional commands as shell command arguments. Therefore, in the previously nightmarish scenario, the entire user input would be enclosed in single quotes, like so:

'http://www.wjgilmore.com/ ; cd /usr/local/apache/htdoc/; rm –rf *'

The result would be that HTMLDOC would simply return an error instead of deleting an entire directory tree because it can’t resolve the URL possessing this syntax.

Escaping Potentially Dangerous Input

The escapeshellcmd() function operates under the same premise as escapeshellarg(), sanitizing potentially dangerous input by escaping shell metacharacters. Its prototype follows:

string escapeshellcmd(string command)

These characters include the following: # & ; , | * ? , ~ < > ^ ( ) [ ] { } $ \\.

PHP’s Program Execution Functions

This section introduces several functions (in addition to the backticks execution operator) used to execute system-level programs via a PHP script. Although at first glance they all appear to be opera- tionally identical, each offers its own syntactical nuances.

Executing a System-Level Command

The exec() function is best-suited for executing an operating system–level application intended to continue in the server background. Its prototype follows:

string exec(string command [, array output [, int return_var]])

Although the last line of output will be returned, chances are that you’d like to have all of the output returned for review; you can do this by including the optional parameter output, which will be populated with each line of output upon completion of the command specified by exec(). In addition, you can discover the executed command’s return status by including the optional parameter return_var.
Although we could take the easy way out and demonstrate how exec() can be used to execute an ls command (dir for the Windows folks), returning the directory listing, it’s more informative to offer a somewhat more practical example: how to call a Perl script from PHP. Consider the following Perl script (languages.pl):
#! /usr/bin/perl
my @languages = qw[perl php python java c];
foreach $language (@languages) {
print $language."<br />";
}

The Perl script is quite simple; no third-party modules are required, so you could test this example
with little time investment. If you’re running Linux, chances are very good that you could run this example immediately because Perl is installed on every respectable distribution. If you’re running Windows, check out ActiveState’s (http://www.activestate.com/) ActivePerl distribution.
Like languages.pl, the PHP script shown here isn’t exactly rocket science; it simply calls the Perl
script, specifying that the outcome be placed into an array named $results. The contents of $results are then output to the browser:

<?php
$outcome = exec("languages.pl", $results);
foreach ($results as $result) echo $result;
?>

The results are as follows:

perl php python java
c

Retrieving a System Command’s Results

The system() function is useful when you want to output the executed command’s results. Its proto- type follows:

string system(string command [, int return_var])

Rather than return output via an optional parameter, as is the case with exec(), the output is returned directly to the caller. However, if you would like to review the execution status of the called program, you need to designate a variable using the optional parameter return_var.
For example, suppose you’d like to list all files located within a specific directory:

$mymp3s = system("ls -1 /home/jason/mp3s/");

The following example calls the aforementioned languages.pl script, this time using system():

<?php
$outcome = exec("languages.pl", $results);
echo $outcome
?>

Returning Binary Output

The passthru() function is similar in function to exec(), except that it should be used if you’d like to return binary output to the caller. Its prototype follows:

void passthru(string command [, int return_var])

For example, suppose you want to convert GIF images to PNG before displaying them to the browser. You could use the Netpbm graphics package, available at http://netpbm.sourceforge.net/ under the GPL license:

<?php header("ContentType:image/png");
passthru("giftopnm cover.gif | pnmtopng > cover.png");
?>

Executing a Shell Command with Backticks

Delimiting a string with backticks signals to PHP that the string should be executed as a shell command, returning any output. Note that backticks are not single quotes but rather are a slanted sibling, commonly sharing a key with the tilde (~) on most U.S. keyboards. An example follows:

<?php
$result = `date`;
printf("<p>The server timestamp is: %s", $result);
?>

This returns something similar to the following:

The server timestamp is: Sun Mar 3 15:32:14 EDT 2007

The backtick operator is operationally identical to the shellexec() function, introduced next.

An Alternative to Backticks

The shell_exec() function offers a syntactical alternative to backticks, executing a shell command and returning the output. It’s prototype follows:

string shell_exec(string command)

Reconsidering the preceding example, this time we’ll use the shell_exec() function instead of backticks:
<?php
$result = shell_exec("date");
printf("<p>The server timestamp is: %s</p>", $result);
?>

Summary

Although you can certainly go a very long way using solely PHP to build interesting and powerful Web applications, such capabilities are greatly expanded when functionality is integrated with the underlying platform and other technologies. As applied to this chapter, these technologies include the underlying operating and file systems. You’ll see this theme repeatedly throughout the remainder of this book, as PHP’s ability to interface with a wide variety of technologies such as LDAP, SOAP, and Web Services is introduced.
In the next chapter, you’ll be introduced to the PHP Extension and Application Repository
(PEAR) and the online community repository for distributing and sharing code.

0 comments: