Tuesday, July 14, 2009

Beginning PHP and Oracle From Novice to Professional by W. Jason Gilmore and Bob Bryla Chapter 9

Programmers build applications that are based on established rules regarding the classification, parsing, storage, and display of information, whether that information consists of gourmet recipes,
store sales receipts, poetry, or some other collection of data. This chapter introduces many of the
PHP functions that you’ll undoubtedly use on a regular basis when performing such tasks.
This chapter covers the following topics:

• Regular expressions: A brief introduction to regular expressions touches upon the features and syntax of PHP’s two supported regular expression implementations: POSIX and Perl. Following that is a complete introduction to PHP’s respective function libraries.
• String manipulation: It’s conceivable that throughout your programming career, you’ll somehow be required to modify every possible aspect of a string. Many of the powerful PHP functions that can help you to do so are introduced in this chapter.
• The PEAR Validate_US package: In this and subsequent chapters, various PEAR packages are introduced that are relevant to the respective chapter’s subject matter. This chapter introduces Validate_US, a PEAR package that is useful for validating the syntax for items commonly used in applications of all types, including phone numbers, Social Security numbers (SSNs), ZIP codes, and state abbreviations. (If you’re not familiar with PEAR, it’s introduced in Chapter 11.)

Regular Expressions

Regular expressions provide the foundation for describing or matching data according to defined syntax rules. A regular expression is nothing more than a pattern of characters itself, matched against a certain parcel of text. This sequence may be a pattern with which you are already familiar, such as the word dog, or it may be a pattern with specific meaning in the context of the world of pattern matching, <(?)>.*<\ /.?>, for example.
PHP is bundled with function libraries supporting both the POSIX and Perl regular expression implementations. Each has its own unique style of syntax and is discussed accordingly in later sections. Keep in mind that innumerable tutorials have been written regarding this matter; you can find infor- mation on the Web and in various books. Therefore, this chapter provides just a basic introduction to each, leaving it to you to search out further information.
If you are not already familiar with the mechanics of general expressions, please take some time to read through the short tutorial that makes up the remainder of this section. If you are already a regular expression pro, feel free to skip past the tutorial to the section “PHP’s Regular Expression Functions (POSIX Extended).”

163

Regular Expression Syntax (POSIX)

The structure of a POSIX regular expression is similar to that of a typical arithmetic expression: various elements (operators) are combined to form a more complex expression. The meaning of the combined regular expression elements is what makes them so powerful. You can locate not only literal expressions, such as a specific word or number, but also a multitude of semantically different
but syntactically similar strings, such as all HTML tags in a file.

■Note POSIX stands for Portable Operating System Interface for Unix, and is representative of a set of standards
originally intended for Unix-based operating systems. POSIX regular expression syntax is an attempt to standardize how regular expressions are implemented in many programming languages.

The simplest regular expression is one that matches a single character, such as g, which would match strings such as gog, haggle, and bag. You could combine several letters together to form larger expressions, such as gan, which logically would match any string containing gan: gang, organize, or Reagan, for example.
You can also test for several different expressions simultaneously by using the pipe (|) character. For example, you could test for php or zend via the regular expression php|zend.
Before getting into PHP’s POSIX-based regular expression functions, let’s review three methods
that POSIX supports for locating different character sequences: brackets, quantifiers, and predefined character ranges.

Brackets

Brackets ([]) are used to represent a list, or range, of characters to be matched. For instance, contrary to the regular expression php, which will locate strings containing the explicit string php, the regular expression [php] will find any string containing the character p or h. Several commonly used char- acter ranges follow:

• [0-9] matches any decimal digit from 0 through 9.

• [a-z] matches any character from lowercase a through lowercase z.

• [A-Z] matches any character from uppercase A through uppercase Z.

• [A-Za-z] matches any character from uppercase A through lowercase z.

Of course, the ranges shown here are general; you could also use the range [0-3] to match any decimal digit ranging from 0 through 3, or the range [b-v] to match any lowercase character ranging from b through v. In short, you can specify any ASCII range you wish.

Quantifiers

Sometimes you might want to create regular expressions that look for characters based on their frequency or position. For example, you might want to look for strings containing one or more instances of the letter p, strings containing at least two p’s, or even strings with the letter p as their beginning or terminating character. You can make these demands by inserting special characters into the regular expression. Here are several examples of these characters:

• p+ matches any string containing at least one p.

• p* matches any string containing zero or more p’s.

• p? matches any string containing zero or one p.

• p{2} matches any string containing a sequence of two p’s.

• p{2,3} matches any string containing a sequence of two or three p’s.

• p{2,} matches any string containing a sequence of at least two p’s.

• p$ matches any string with p at the end of it.

Still other flags can be inserted before and within a character sequence:

• ^p matches any string with p at the beginning of it.

• [^a-zA-Z] matches any string not containing any of the characters ranging from a through z
and A through Z.

• p.p matches any string containing p, followed by any character, in turn followed by another p.

You can also combine special characters to form more complex expressions. Consider the following examples:

• ^.{2}$ matches any string containing exactly two characters.

• <b>(.*)</b> matches any string enclosed within <b> and </b>.

• p(hp)* matches any string containing a p followed by zero or more instances of the sequence hp.

You may wish to search for these special characters in strings instead of using them in the special context just described. To do so, the characters must be escaped with a backslash (\). For example, if you want to search for a dollar amount, a plausible regular expression would be as follows: ([\$])([0-9]+); that is, a dollar sign followed by one or more integers. Notice the backslash preceding the dollar sign. Potential matches of this regular expression include $42, $560 and $3.

Predefined Character Ranges (Character Classes)

For reasons of convenience, several predefined character ranges, also known as character classes, are available. Character classes specify an entire range of characters—for example, the alphabet or an integer set. Standard classes include the following:

[:alpha:]: Lowercase and uppercase alphabetical characters. This can also be specified as
[A-Za-z].

[:alnum:]: Lowercase and uppercase alphabetical characters and numerical digits. This can also be specified as [A-Za-z0-9].

[:cntrl:]: Control characters such as tab, escape, or backspace.

[:digit:]: Numerical digits 0 through 9. This can also be specified as [0-9].

[:graph:]: Printable characters found in the range of ASCII 33 to 126.

[:lower:]: Lowercase alphabetical characters. This can also be specified as [a-z].

[:punct:]: Punctuation characters, including ~ ` ! @ # $ % ^&* () -_ += {}[ ]: ;' <> ,. ? and /.

[:upper:]: Uppercase alphabetical characters. This can also be specified as [A-Z].

[:space:]: Whitespace characters, including the space, horizontal tab, vertical tab, new line, form feed, or carriage return.

[:xdigit:]: Hexadecimal characters. This can also be specified as [a-fA-F0-9].

PHP’s Regular Expression Functions (POSIX Extended)

PHP offers seven functions for searching strings using POSIX-style regular expressions: ereg(), ereg_replace(), eregi(), eregi_replace(), split(), spliti(), and sql_regcase(). These functions are discussed in this section.

Performing a Case-Sensitive Search

The ereg() function executes a case-sensitive search of a string for a defined pattern, returning TRUE
if the pattern is found, and FALSE otherwise. Its prototype follows:

boolean ereg(string pattern, string string [, array regs])

Here’s how you could use ereg() to ensure that a username consists solely of lowercase letters:

<?php
$username = "jasoN";
if (ereg("([^a-z])",$username))
echo "Username must be all lowercase!";

else

?>

echo "Username is all lowercase!";

In this case, ereg() will return TRUE, causing the error message to output.
The optional input parameter regs contains an array of all matched expressions that are grouped by parentheses in the regular expression. Making use of this array, you could segment a URL into several pieces, as shown here:
<?php
$url = "http://www.apress.com";

// Break $url down into three distinct pieces:
// "http://www", "apress", and "com"
$parts = ereg("^(http://www)\.([[:alnum:]]+)\.([[:alnum:]]+)", $url, $regs);

echo $regs[0]; // outputs the entire string "http://www.apress.com" echo "<br />";
echo $regs[1]; // outputs "http://www"
echo "<br />";
echo $regs[2]; // outputs "apress" echo "<br />";
echo $regs[3]; // outputs "com"
?>

This returns the following:

http://www.apress.com http://www
apress
com

Performing a Case-Insensitive Search

The eregi() function searches a string for a defined pattern in a case-insensitive fashion. Its proto- type follows:

int eregi(string pattern, string string, [array regs])

This function can be useful when checking the validity of strings, such as passwords. This concept is illustrated in the following example:
<?php
$pswd = "jasonasdf";
if (!eregi("^[a-zA-Z0-9]{8,10}$", $pswd))
echo "Invalid password!";
else
echo "Valid password!";
?>

In this example, the user must provide an alphanumeric password consisting of eight to ten
characters, or else an error message is displayed.

Replacing Text in a Case-Sensitive Fashion

The ereg_replace() function operates much like ereg(), except that its power is extended to finding and replacing a pattern with a replacement string instead of simply locating it. Its prototype follows:

string ereg_replace(string pattern, string replacement, string string)

If no matches are found, the string will remain unchanged. Like ereg(), ereg_replace() is case sensitive. Consider an example:

<?php
$text = "This is a link to http://www.wjgilmore.com/.";
echo ereg_replace("http://([a-zA-Z0-9./-]+)$", "<a href=\"\\">\</a>",
$text);
?>

This returns the following:

This is a link to
<a href="http://www.wjgilmore.com/">http://www.wjgilmore.com</a>.

A rather interesting feature of PHP’s string-replacement capability is the ability to back-refer- ence parenthesized substrings. This works much like the optional input parameter regs in the function ereg(), except that the substrings are referenced using backslashes, such as , \1, \2, and so on, where refers to the entire string, \1 the first successful match, and so on. Up to nine back references can be used. This example shows how to replace all references to a URL with a working hyperlink:

$url = "Apress (http://www.apress.com)";
$url = ereg_replace("http://([a-zA-Z0-9./-]+)([a-zA-Z/]+)", "<a href=\"\\">\</a>", $url);
echo $url;
// Displays Apress (<a href="http://www.apress.com">http://www.apress.com</a>)

■Note Although ereg_replace() works just fine, another predefined function named str_replace() is
actually much faster when complex regular expressions are not required. str_replace() is discussed in the later section “Replacing All Instances of a String with Another String.”

Replacing Text in a Case-Insensitive Fashion

The eregi_replace() function operates exactly like ereg_replace(), except that the search for
pattern in string is not case sensitive. Its prototype follows:

string eregi_replace(string pattern, string replacement, string string)

Splitting a String into Various Elements Based on a Case-Sensitive Pattern

The split() function divides a string into various elements, with the boundaries of each element based on the occurrence of a defined pattern within the string. Its prototype follows:

array split(string pattern, string string [, int limit])

The optional input parameter limit is used to specify the number of elements into which the string should be divided, starting from the left end of the string and working rightward. In cases where the pattern is an alphabetical character, split() is case sensitive. Here’s how you would use split() to break a string into pieces based on occurrences of horizontal tabs and newline characters:

<?php
$text = "this is\tsome text that\nwe might like to parse.";
print_r(split("[\n\t]",$text));
?>

This returns the following:

Array ( [0] => this is [1] => some text that [2] => we might like to parse. )

Splitting a String into Various Elements Based on a Case-Insensitive Pattern

The spliti() function operates exactly in the same manner as its sibling, split(), except that its pattern is treated in a case-insensitive fashion. Its prototype follows:

array spliti(string pattern, string string [, int limit])

Accomodating Products Supporting Solely Case-Sensitive Regular Expressions

The sql_regcase() function converts each character in a string into a bracketed expression containing two characters. If the character is alphabetical, the bracket will contain both forms; otherwise, the original character will be left unchanged. Its prototype follows:

string sql_regcase(string string)

You might use this function as a workaround when using PHP applications to talk to other appli- cations that support only case-sensitive regular expressions. Here’s how you would use sql_regcase() to convert a string:
<?php
$version = "php 4.0";
echo sql_regcase($version);
// outputs [Pp] [Hh] [Pp] 4.0
?>

Regular Expression Syntax (Perl)

Perl has long been considered one of the most powerful parsing languages ever written, and it provides a comprehensive regular expression language that can be used to search and replace even the most complicated of string patterns. The developers of PHP felt that instead of reinventing the regular expression wheel, so to speak, they should make the famed Perl regular expression syntax available to PHP users.
Perl’s regular expression syntax is actually a derivation of the POSIX implementation, resulting in considerable similarities between the two. You can use any of the quantifiers introduced in the previous POSIX section. The remainder of this section is devoted to a brief introduction of Perl regular expression syntax. Let’s start with a simple example of a Perl-based regular expression:

/food/

Notice that the string food is enclosed between two forward slashes. Just as with POSIX regular expressions, you can build a more complex string through the use of quantifiers:

/fo+/

This will match fo followed by one or more characters. Some potential matches include food,
fool, and fo4. Here is another example of using a quantifier:

/fo{2,4}/

This matches f followed by two to four occurrences of o. Some potential matches include fool,
fooool, and foosball.

Modifiers

Often you’ll want to tweak the interpretation of a regular expression; for example, you may want to tell the regular expression to execute a case-insensitive search or to ignore comments embedded within its syntax. These tweaks are known as modifiers, and they go a long way toward helping you to write short and concise expressions. A few of the more interesting modifiers are outlined in Table 9-1.

Table 9-1. Six Sample Modifiers

Modifier Description
i Perform a case-insensitive search.

g Find all occurrences (perform a global search).

m Treat a string as several (m for multiple) lines. By default, the ^ and $ characters match at the very start and very end of the string in question. Using the m modifier will allow for ^ and $ to match at the beginning of any line in a string.

s Treat a string as a single line, ignoring any newline characters found within; this accomplishes just the opposite of the m modifier.

x Ignore white space and comments within the regular expression.

U Stop at the first match. Many quantifiers are “greedy”; they match the pattern as many times as possible rather than just stop at the first match. You can cause them to be “ungreedy” with this modifier.

These modifiers are placed directly after the regular expression—for instance, /string/i. Let’s consider a few examples:

/wmd/i: Matches WMD, wMD, WMd, wmd, and any other case variation of the string wmd.

/taxation/gi: Locates all occurrences of the word taxation. You might use the global modifier to tally up the total number of occurrences, or use it in conjunction with a replacement feature to replace all occurrences with some other string.

Metacharacters

Perl regular expressions also employ metacharacters to further filter their searches. A metacharacter is simply an alphabetical character preceded by a backslash that symbolizes special meaning. A list of useful metacharacters follows:

\A: Matches only at the beginning of the string.

\b: Matches a word boundary.

\B: Matches anything but a word boundary.

\d: Matches a digit character. This is the same as [0-9].

\D: Matches a nondigit character.

\s: Matches a whitespace character.

\S: Matches a nonwhitespace character.

[]: Encloses a character class.

(): Encloses a character grouping or defines a back reference.

$: Matches the end of a line.

^: Matches the beginning of a line.

.: Matches any character except for the newline.

\: Quotes the next metacharacter.

\w: Matches any string containing solely underscore and alphanumeric characters. This is the same as [a-zA-Z0-9_].

\W: Matches a string, omitting the underscore and alphanumeric characters.

Let’s consider a few examples. The first regular expression will match strings such as pisa and
lisa but not sand:

/sa\b/

The next returns the first case-insensitive occurrence of the word linux:

/\blinux\b/i

The opposite of the word boundary metacharacter is \B, matching on anything but a word boundary. Therefore this example will match strings such as sand and Sally but not Melissa:

/sa\B/

The final example returns all instances of strings matching a dollar sign followed by one or more digits:

/\$\d+\g

PHP’s Regular Expression Functions (Perl Compatible)

PHP offers seven functions for searching strings using Perl-compatible regular expressions: preg_grep(), preg_match(), preg_match_all(), preg_quote(), preg_replace(), preg_replace_callback(), and preg_split(). These functions are introduced in the following sections.

Searching an Array

The preg_grep() function searches all elements of an array, returning an array consisting of all elements matching a certain pattern. Its prototype follows:

array preg_grep(string pattern, array input [, flags])

Consider an example that uses this function to search an array for foods beginning with p:

<?php
$foods = array("pasta", "steak", "fish", "potatoes");
$food = preg_grep("/^p/", $foods);
print_r($food);
?>

This returns the following:

Array ( [0] => pasta [3] => potatoes )

Note that the array corresponds to the indexed order of the input array. If the value at that index position matches, it’s included in the corresponding position of the output array. Otherwise, that position is empty. If you want to remove those instances of the array that are blank, filter the output array through the function array_values(), introduced in Chapter 5.
The optional input parameter flags was added in PHP version 4.3. It accepts one value,
PREG_GREP_INVERT. Passing this flag will result in retrieval of those array elements that do not match the pattern.

Searching for a Pattern

The preg_match() function searches a string for a specific pattern, returning TRUE if it exists, and
FALSE otherwise. Its prototype follows:

int preg_match(string pattern, string string [, array matches] [, int flags [, int offset]]])
The optional input parameter pattern_array can contain various sections of the subpatterns contained in the search pattern, if applicable. Here’s an example that uses preg_match() to perform a case-insensitive search:

<?php
$line = "vim is the greatest word processor ever created!";
if (preg_match("/\bVim\b/i", $line, $match)) print "Match found!";
?>

For instance, this script will confirm a match if the word Vim or vim is located, but not simplevim,
vims, or evim.

Matching All Occurrences of a Pattern

The preg_match_all() function matches all occurrences of a pattern in a string, assigning each occurrence to an array in the order you specify via an optional input parameter. Its prototype follows:

int preg_match_all(string pattern, string string, array pattern_array
[, int order])

The order parameter accepts two values:

• PREG_PATTERN_ORDER is the default if the optional order parameter is not included.
PREG_PATTERN_ORDER specifies the order in the way that you might think most logical:
$pattern_array[0] is an array of all complete pattern matches, $pattern_array[1] is an array of all strings matching the first parenthesized regular expression, and so on.
• PREG_SET_ORDER orders the array a bit differently than the default setting. $pattern_array[0] contains elements matched by the first parenthesized regular expression, $pattern_array[1] contains elements matched by the second parenthesized regular expression, and so on.

Here’s how you would use preg_match_all() to find all strings enclosed in bold HTML tags:

<?php
$userinfo = "Name: <b>Zeev Suraski</b> <br> Title: <b>PHP Guru</b>";
preg_match_all("/<b>(.*)<\/b>/U", $userinfo, $pat_array);
printf("%s <br /> %s", $pat_array[0][0], $pat_array[0][1]);
?>

This returns the following:

Zeev Suraski
PHP Guru

Delimiting Special Regular Expression Characters

The function preg_quote() inserts a backslash delimiter before every character of special signifi- cance to regular expression syntax. These special characters include $^ *() +={ }[] | \\ :< >. Its prototype follows:

string preg_quote(string str [, string delimiter])

The optional parameter delimiter specifies what delimiter is used for the regular expression, causing it to also be escaped by a backslash. Consider an example:

<?php
$text = "Tickets for the bout are going for $500.";
echo preg_quote($text);
?>

This returns the following:

Tickets for the bout are going for \$500\.

Replacing All Occurrences of a Pattern

The preg_replace() function operates identically to ereg_replace(), except that it uses a Perl-based regular expression syntax, replacing all occurrences of pattern with replacement, and returning the modified result. Its prototype follows:

mixed preg_replace(mixed pattern, mixed replacement, mixed str [, int limit])

The optional input parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences. Consider an example:
<?php
$text = "This is a link to http://www.wjgilmore.com/.";
echo preg_replace("/http:\/\/(.*)\//", "<a href=\"\${0}\">\${0}</a>", $text);
?>

This returns the following:

This is a link to
<a href="http://www.wjgilmore.com/">http://www.wjgilmore.com/</a>.

Interestingly, the pattern and replacement input parameters can also be arrays. This function will cycle through each element of each array, making replacements as they are found. Consider this example, which could be marketed as a corporate report filter:

<?php
$draft = "In 2007 the company faced plummeting revenues and scandal.";
$keywords = array("/faced/", "/plummeting/", "/scandal/");
$replacements = array("celebrated", "skyrocketing", "expansion");
echo preg_replace($keywords, $replacements, $draft);
?>

This returns the following:

In 2007 the company celebrated skyrocketing revenues and expansion.

Creating a Custom Replacement Function

In some situations you might wish to replace strings based on a somewhat more complex set of criteria beyond what is provided by PHP’s default capabilities. For instance, consider a situation where you want to scan some text for acronyms such as IRS and insert the complete name directly following the acronym. To do so, you need to create a custom function and then use the function preg_replace_callback() to temporarily tie it into the language. Its prototype follows:
mixed preg_replace_callback(mixed pattern, callback callback, mixed str
[, int limit])

The pattern parameter determines what you’re looking for, while the str parameter defines the string you’re searching. The callback parameter defines the name of the function to be used for the replacement task. The optional parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences. In the following example, a function named acronym() is passed into preg_replace_callback() and is used to insert the long form of various acronyms into the target string:

<?php

// This function will add the acronym's long form
// directly after any acronyms found in $matches function acronym($matches) {
$acronyms = array(
'WWW' => 'World Wide Web',
'IRS' => 'Internal Revenue Service',
'PDF' => 'Portable Document Format');

if (isset($acronyms[$matches[1]]))
return $matches[1] . " (" . $acronyms[$matches[1]] . ")";
else
return $matches[1];
}

// The target text
$text = "The <acronym>IRS</acronym> offers tax forms in
<acronym>PDF</acronym> format on the <acronym>WWW</acronym>.";

// Add the acronyms' long forms to the target text
$newtext = preg_replace_callback("/<acronym>(.*)<\/acronym>/U", 'acronym',
$text);

print_r($newtext);

?>

This returns the following:

The IRS (Internal Revenue Service) offers tax forms
in PDF (Portable Document Format) on the WWW (World Wide Web).

Splitting a String into Various Elements Based on a Case-Insensitive Pattern

The preg_split() function operates exactly like split(), except that pattern can also be defined in terms of a regular expression. Its prototype follows:

array preg_split(string pattern, string string [, int limit [, int flags]])

If the optional input parameter limit is specified, only limit number of substrings are returned. Consider an example:

<?php
$delimitedText = "Jason+++Gilmore+++++++++++Columbus+++OH";
$fields = preg_split("/\+{1,}/", $delimitedText);
foreach($fields as $field) echo $field."<br />";
?>

This returns the following:

Jason Gilmore Columbus OH

■Note Later in this chapter, the section titled “Alternatives for Regular Expression Functions” offers several stan-
dard functions that can be used in lieu of regular expressions for certain tasks. In many cases, these alternative functions actually perform much faster than their regular expression counterparts.

Other String-Specific Functions

In addition to the regular expression–based functions discussed in the first half of this chapter, PHP offers more than 100 functions collectively capable of manipulating practically every imaginable aspect of a string. To introduce each function would be out of the scope of this book and would only repeat much of the information in the PHP documentation. This section is devoted to a categorical FAQ of sorts, focusing upon the string-related issues that seem to most frequently appear within community forums. The section is divided into the following topics:

• Determining string length

• Comparing string length

• Manipulating string case

• Converting strings to and from HTML

• Alternatives for regular expression functions

• Padding and stripping a string

• Counting characters and words

Determining the Length of a String

Determining string length is a repeated action within countless applications. The PHP function strlen() accomplishes this task quite nicely. This function returns the length of a string, where each character in the string is equivalent to one unit. Its prototype follows:

int strlen(string str)

The following example verifies whether a user password is of acceptable length:

<?php
$pswd = "secretpswd";
if (strlen($pswd) < 10)
echo "Password is too short!";
else
echo "Password is valid!";
?>

In this case, the error message will not appear because the chosen password consists of ten
characters, whereas the conditional expression validates whether the target string consists of less than ten characters.

Comparing Two Strings

String comparison is arguably one of the most important features of the string-handling capabilities of any language. Although there are many ways in which two strings can be compared for equality, PHP provides four functions for performing this task: strcmp(), strcasecmp(), strspn(), and strcspn(). These functions are discussed in the following sections.

Comparing Two Strings Case Sensitively

The strcmp() function performs a binary-safe, case-sensitive comparison of two strings. Its prototype follows:

int strcmp(string str1, string str2)

It will return one of three possible values based on the comparison outcome:

• 0 if str1 and str2 are equal

• -1 if str1 is less than str2

• 1 if str2 is less than str1

Web sites often require a registering user to enter and then confirm a password, lessening the possibility of an incorrectly entered password as a result of a typing error. strcmp() is a great function for comparing the two password entries because passwords are often case sensitive:
<?php
$pswd = "supersecret";
$pswd2 = "supersecret2";

if (strcmp($pswd,$pswd2) != 0)
echo "Passwords do not match!";

else

?>

echo "Passwords match!";

Note that the strings must match exactly for strcmp() to consider them equal. For example, Supersecret is different from supersecret. If you’re looking to compare two strings case insensitively, consider strcasecmp(), introduced next.
Another common point of confusion regarding this function surrounds its behavior of returning
0 if the two strings are equal. This is different from executing a string comparison using the == operator, like so:

if ($str1 == $str2)

While both accomplish the same goal, which is to compare two strings, keep in mind that the values they return in doing so are different.

Comparing Two Strings Case Insensitively

The strcasecmp() function operates exactly like strcmp(), except that its comparison is case insensitive. Its prototype follows:

int strcasecmp(string str1, string str2)

The following example compares two e-mail addresses, an ideal use for strcasecmp() because case does not determine an e-mail address’s uniqueness:

<?php
$email1 = "admin@example.com";
$email2 = "ADMIN@example.com";

if (! strcasecmp($email1, $email2))
echo "The email addresses are identical!";
?>

In this example, the message is output because strcasecmp() performs a case-insensitive compar-
ison of $email1 and $email2 and determines that they are indeed identical.

Calculating the Similarity Between Two Strings

The strspn() function returns the length of the first segment in a string containing characters also found in another string. Its prototype follows:

int strspn(string str1, string str2)

Here’s how you might use strspn() to ensure that a password does not consist solely of numbers:

<?php
$password = "3312345";
if (strspn($password, "1234567890") == strlen($password))
echo "The password cannot consist solely of numbers!";
?>

In this case, the error message is returned because $password does indeed consist solely of digits.

Calculating the Difference Between Two Strings

The strcspn() function returns the length of the first segment of a string containing characters not found in another string. Its prototype follows:

int strcspn(string str1, string str2)

Here’s an example of password validation using strcspn():

<?php
$password = "a12345";
if (strcspn($password, "1234567890") == 0) {
echo "Password cannot consist solely of numbers!";
}
?>

In this case, the error message will not be displayed because $password does not consist solely
of numbers.

Manipulating String Case

Four functions are available to aid you in manipulating the case of characters in a string: strtolower(),
strtoupper(), ucfirst(), and ucwords(). These functions are discussed in this section.

Converting a String to All Lowercase

The strtolower() function converts a string to all lowercase letters, returning the modified string. Nonalphabetical characters are not affected. Its prototype follows:

string strtolower(string str)

The following example uses strtolower() to convert a URL to all lowercase letters:

<?php
$url = "http://WWW.EXAMPLE.COM/";
echo strtolower($url);
?>

This returns the following:

http://www.example.com/

Converting a String to All Uppercase

Just as you can convert a string to lowercase, you can convert it to uppercase. This is accomplished with the function strtoupper(). Its prototype follows:

string strtoupper(string str)

Nonalphabetical characters are not affected. This example uses strtoupper() to convert a string to all uppercase letters:
<?php
$msg = "I annoy people by capitalizing e-mail text.";
echo strtoupper($msg);
?>

This returns the following:

I ANNOY PEOPLE BY CAPITALIZING E-MAIL TEXT.

Capitalizing the First Letter of a String

The ucfirst() function capitalizes the first letter of the string str, if it is alphabetical. Its prototype follows:

string ucfirst(string str)

Nonalphabetical characters will not be affected. Additionally, any capitalized characters found in the string will be left untouched. Consider this example:
<?php
$sentence = "the newest version of PHP was released today!";
echo ucfirst($sentence);
?>

This returns the following:

The newest version of PHP was released today!

Note that while the first letter is indeed capitalized, the capitalized word PHP was left untouched.

Capitalizing Each Word in a String

The ucwords() function capitalizes the first letter of each word in a string. Its prototype follows:

string ucwords(string str)

Nonalphabetical characters are not affected. This example uses ucwords() to capitalize each word in a string:

<?php
$title = "O'Malley wins the heavyweight championship!";
echo ucwords($title);
?>

This returns the following:

O'Malley Wins The Heavyweight Championship!

Note that if O’Malley was accidentally written as O’malley, ucwords() would not catch the error, as it considers a word to be defined as a string of characters separated from other entities in the string by a blank space on each side.

Converting Strings to and from HTML

Converting a string or an entire file into a form suitable for viewing on the Web (and vice versa) is easier than you would think. Several functions are suited for such tasks, all of which are introduced in this section.

Converting Newline Characters to HTML Break Tags

The nl2br() function converts all newline (\n) characters in a string to their XHTML-compliant equivalent, <br />. Its prototype follows:

string nl2br(string str)

The newline characters could be created via a carriage return, or explicitly written into the string. The following example translates a text string to HTML format:
<?php
$recipe = "3 tablespoons Dijon mustard
1/3 cup Caesar salad dressing
8 ounces grilled chicken breast
3 cups romaine lettuce";

// convert the newlines to <br />'s. echo nl2br($recipe);
?>

Executing this example results in the following output:

3 tablespoons Dijon mustard<br />
1/3 cup Caesar salad dressing<br />
8 ounces grilled chicken breast<br />
3 cups romaine lettuce

Converting Special Characters to their HTML Equivalents

During the general course of communication, you may come across many characters that are not included in a document’s text encoding, or that are not readily available on the keyboard. Examples of such characters include the copyright symbol (©), the cent sign (¢), and the grave accent (è). To facilitate such shortcomings, a set of universal key codes was devised, known as character entity references. When these entities are parsed by the browser, they will be converted into their recogniz- able counterparts. For example, the three aforementioned characters would be presented as &copy;,
&cent;, and &Egrave;, respectively.
To perform these conversions, you can use the htmlentities() function. Its prototype follows:

string htmlentities(string str [, int quote_style [, int charset]])

Because of the special nature of quote marks within markup, the optional quote_style parameter offers the opportunity to choose how they will be handled. Three values are accepted:

ENT_COMPAT: Convert double quotes and ignore single quotes. This is the default.

ENT_NOQUOTES: Ignore both double and single quotes.

ENT_QUOTES: Convert both double and single quotes.

A second optional parameter, charset, determines the character set used for the conversion. Table 9-2 offers the list of supported character sets. If charset is omitted, it will default to ISO-8859-1.

Table 9-2. htmlentities()’s Supported Character Sets

Character Set Description

BIG5 Traditional Chinese

BIG5-HKSCS BIG5 with additional Hong Kong extensions, traditional Chinese

cp866 DOS-specific Cyrillic character set

cp1251 Windows-specific Cyrillic character set

cp1252 Windows-specific character set for Western Europe

EUC-JP Japanese

GB2312 Simplified Chinese

ISO-8859-1 Western European, Latin-1

ISO-8859-15 Western European, Latin-9

KOI8-R Russian

Shift-JIS Japanese

UTF-8 ASCII-compatible multibyte 8 encode

The following example converts the necessary characters for Web display:

<?php
$advertisement = "Coffee at 'Cafè Française' costs $2.25.";
echo htmlentities($advertisement);
?>

This returns the following:

Coffee at 'Caf&egrave; Fran&ccedil;aise' costs $2.25.

Two characters are converted, the grave accent (è) and the cedilla (ç). The single quotes are ignored due to the default quote_style setting ENT_COMPAT.

Using Special HTML Characters for Other Purposes

Several characters play a dual role in both markup languages and the human language. When used in the latter fashion, these characters must be converted into their displayable equivalents. For example, an ampersand must be converted to &amp;, whereas a greater-than character must be converted to
&gt;. The htmlspecialchars() function can do this for you, converting the following characters into their compatible equivalents. Its prototype follows:

string htmlspecialchars(string str [, int quote_style [, string charset]])

The list of characters that htmlspecialchars() can convert and their resulting formats follow:

• & becomes &amp;

• " (double quote) becomes &quot;

• ' (single quote) becomes '

• < becomes &lt;

• > becomes &gt;

This function is particularly useful in preventing users from entering HTML markup into an interactive Web application, such as a message board.
The following example converts potentially harmful characters using htmlspecialchars():

<?php
$input = "I just can't get <<enough>> of PHP!";
echo htmlspecialchars($input);
?>

Viewing the source, you’ll see the following:

I just can't get &lt;&lt;enough&gt;&gt; of PHP &amp!

If the translation isn’t necessary, perhaps a more efficient way to do this would be to use
strip_tags(), which deletes the tags from the string altogether.

■Tip If you are using gethtmlspecialchars() in conjunction with a function such as nl2br(), you should
execute nl2br() after gethtmlspecialchars(); otherwise, the <br /> tags that are generated with nl2br() will be converted to visible characters.

Converting Text into Its HTML Equivalent

Using get_html_translation_table() is a convenient way to translate text to its HTML equivalent, returning one of the two translation tables (HTML_SPECIALCHARS or HTML_ENTITIES). Its prototype follows:

array get_html_translation_table(int table [, int quote_style])

This returned value can then be used in conjunction with another predefined function, strtr()
(formally introduced later in this section), to essentially translate the text into its corresponding
HTML code.
The following sample uses get_html_translation_table() to convert text to HTML:

<?php
$string = "La pasta é il piatto piú amato in Italia";
$translate = get_html_translation_table(HTML_ENTITIES);
echo strtr($string, $translate);
?>

This returns the string formatted as necessary for browser rendering:

La pasta &eacute; il piatto pi&uacute; amato in Italia

Interestingly, array_flip() is capable of reversing the text-to-HTML translation and vice versa. Assume that instead of printing the result of strtr() in the preceding code sample, you assign it to the variable $translated_string.
The next example uses array_flip() to return a string back to its original value:

<?php
$entities = get_html_translation_table(HTML_ENTITIES);
$translate = array_flip($entities);
$string = "La pasta &eacute; il piatto pi&uacute; amato in Italia";
echo strtr($string, $translate);
?>

This returns the following:

La pasta é il piatto piú amato in italia

Creating a Customized Conversion List

The strtr() function converts all characters in a string to their corresponding match found in a predefined array. Its prototype follows:

string strtr(string str, array replacements)

This example converts the deprecated bold (<b>) character to its XHTML equivalent:

<?php
$table = array("<b>" => "<strong>", "</b>" => "</strong>");
$html = "<b>Today In PHP-Powered News</b>";
echo strtr($html, $table);
?>

This returns the following:

<strong>Today In PHP-Powered News</strong>

Converting HTML to Plain Text

You may sometimes need to convert an HTML file to plain text. You can do so using the strip_tags() function, which removes all HTML and PHP tags from a string, leaving only the text entities. Its prototype follows:

string strip_tags(string str [, string allowable_tags])

The optional allowable_tags parameter allows you to specify which tags you would like to be skipped during this process. This example uses strip_tags() to delete all HTML tags from a string:

<?php
$input = "Email <a href='spammer@example.com'>spammer@example.com</a>";
echo strip_tags($input);
?>

This returns the following:

Email spammer@example.com

The following sample strips all tags except the <a> tag:

<?php
$input = "This <a href='http://www.example.com/'>example</a>
is <b>awesome</b>!";
echo strip_tags($input, "<a>");
?>

This returns the following:

This <a href='http://www.example.com/'>example</a> is awesome!

■Note Another function that behaves like strip_tags() is fgetss(). This function is described in Chapter 10.

Alternatives for Regular Expression Functions When you’re processing large amounts of information, the regular expression functions can slow matters dramatically. You should use these functions only when you are interested in parsing relatively
complicated strings that require the use of regular expressions. If you are instead interested in parsing
for simple expressions, there are a variety of predefined functions that speed up the process consid- erably. Each of these functions is described in this section.

Tokenizing a String Based on Predefined Characters

The strtok() function parses the string based on a predefined list of characters. Its prototype follows:

string strtok(string str, string tokens)

One oddity about strtok() is that it must be continually called in order to completely tokenize a string; each call only tokenizes the next piece of the string. However, the str parameter needs to be specified only once because the function keeps track of its position in str until it either completely tokenizes str or a new str parameter is specified. Its behavior is best explained via an example:
<?php
$info = "J. Gilmore:jason@example.com|Columbus, Ohio";

// delimiters include colon (:), vertical bar (|), and comma (,)
$tokens = ":|,";
$tokenized = strtok($info, $tokens);

// print out each element in the $tokenized array while ($tokenized) {
echo "Element = $tokenized<br>";
// Don't include the first argument in subsequent calls.
$tokenized = strtok($tokens);
}
?>

This returns the following:

Element = J. Gilmore
Element = jason@example.com
Element = Columbus
Element = Ohio

Exploding a String Based on a Predefined Delimiter

The explode() function divides the string str into an array of substrings. Its prototype follows:

array explode(string separator, string str [, int limit])

The original string is divided into distinct elements by separating it based on the character sepa- rator specified by separator. The number of elements can be limited with the optional inclusion of limit. Let’s use explode() in conjunction with sizeof() and strip_tags() to determine the total number of words in a given block of text:

<?php
$summary = <<< summary
In the latest installment of the ongoing Developer.com PHP series, I discuss the many improvements and additions to
<a href="http://www.php.net">PHP 5's</a> object-oriented architecture. summary;
$words = sizeof(explode(' ',strip_tags($summary)));
echo "Total words in summary: $words";
?>

This returns the following:

Total words in summary: 22

The explode() function will always be considerably faster than preg_split(), split(), and
spliti(). Therefore, always use it instead of the others when a regular expression isn’t necessary.

■Note You might be wondering why the previous code is indented in an inconsistent manner. The multiple-line
string was delimited using heredoc syntax, which requires the closing identifier to not be indented even a single space. Why this restriction is in place is somewhat of a mystery, although one would presume it makes the PHP engine’s job a tad easier when parsing the multiple-line string. See Chapter 3 for more information about heredoc.

Converting an Array into a String

Just as you can use the explode() function to divide a delimited string into various array elements, you concatenate array elements to form a single delimited string using the implode() function. Its prototype follows:

string implode(string delimiter, array pieces)

This example forms a string out of the elements of an array:

<?php
$cities = array("Columbus", "Akron", "Cleveland", "Cincinnati");
echo implode("|", $cities);
?>

This returns the following:

Columbus|Akron|Cleveland|Cincinnati

Performing Complex String Parsing

The strpos() function finds the position of the first case-sensitive occurrence of substr in a string. Its prototype follows:

int strpos(string str, string substr [, int offset])

The optional input parameter offset specifies the position at which to begin the search. If substr is not in str, strpos() will return FALSE. The optional parameter offset determines the posi- tion from which strpos() will begin searching. The following example determines the timestamp of the first time index.html is accessed:

<?php
$substr = "index.html";
$log = <<< logfile
192.168.1.11:/www/htdocs/index.html:[2006/02/10:20:36:50]
192.168.1.13:/www/htdocs/about.html:[2006/02/11:04:15:23]
192.168.1.15:/www/htdocs/index.html:[2006/02/15:17:25]
logfile;

// What is first occurrence of the time $substr in log?
$pos = strpos($log, $substr);

// Find the numerical position of the end of the line
$pos2 = strpos($log,"\n",$pos);

// Calculate the beginning of the timestamp
$pos = $pos + strlen($substr) + 1;

// Retrieve the timestamp
$timestamp = substr($log,$pos,$pos2-$pos);

echo "The file $substr was first accessed on: $timestamp";
?>

This returns the position in which the file index.html is first accessed:

The file index.html was first accessed on: [2006/02/10:20:36:50]

The function stripos() operates identically to strpos(), except that it executes its search case insensitively.

Finding the Last Occurrence of a String

The strrpos() function finds the last occurrence of a string, returning its numerical position. Its prototype follows:

int strrpos(string str, char substr [, offset])

The optional parameter offset determines the position from which strrpos() will begin searching. Suppose you wanted to pare down lengthy news summaries, truncating the summary and replacing the truncated component with an ellipsis. However, rather than simply cut off the summary explic- itly at the desired length, you want it to operate in a user-friendly fashion, truncating at the end of the word closest to the truncation length. This function is ideal for such a task. Consider this example:
<?php
// Limit $summary to how many characters?
$limit = 100;

$summary = <<< summary
In the latest installment of the ongoing Developer.com PHP series, I discuss the many improvements and additions to
<a href="http://www.php.net">PHP 5's</a> object-oriented architecture.
summary;

if (strlen($summary) > $limit)
$summary = substr($summary, 0, strrpos(substr($summary, 0, $limit),
' ')) . '...';
echo $summary;
?>

This returns the following:

In the latest installment of the ongoing Developer.com PHP series, I discuss the many...

Replacing All Instances of a String with Another String

The str_replace() function case sensitively replaces all instances of a string with another. Its proto- type follows:

mixed str_replace(string occurrence, mixed replacement, mixed str [, int count])

If occurrence is not found in str, the original string is returned unmodified. If the optional parameter count is defined, only count occurrences found in str will be replaced.
This function is ideal for hiding e-mail addresses from automated e-mail address retrieval programs:

<?php
$author = "jason@example.com";
$author = str_replace("@","(at)",$author);
echo "Contact the author of this article at $author.";
?>

This returns the following:

Contact the author of this article at jason(at)example.com.

The function str_ireplace() operates identically to str_replace(), except that it is capable of executing a case-insensitive search.

Retrieving Part of a String

The strstr() function returns the remainder of a string beginning with the first occurrence of a predefined string. Its prototype follows:

string strstr(string str, string occurrence)

This example uses the function in conjunction with the ltrim() function to retrieve the domain name of an e-mail address:
<?php
$url = "sales@example.com";
echo ltrim(strstr($url, "@"),"@");
?>

This returns the following:

example.com

Returning Part of a String Based on Predefined Offsets

The substr() function returns the part of a string located between a predefined starting offset and length positions. Its prototype follows:

string substr(string str, int start [, int length])

If the optional length parameter is not specified, the substring is considered to be the string starting at start and ending at the end of str. There are four points to keep in mind when using this function:

• If start is positive, the returned string will begin at the start position of the string.

• If start is negative, the returned string will begin at the length-start position of the string.

• If length is provided and is positive, the returned string will consist of the characters between start and start + length. If this distance surpasses the total string length, only the string between start and the string’s end will be returned.
• If length is provided and is negative, the returned string will end length characters from the end of str.

Keep in mind that start is the offset from the first character of str; therefore, the returned string will actually start at character position start + 1. Consider a basic example:
<?php
$car = "1944 Ford";
echo substr($car, 5);
?>

This returns the following:

Ford

The following example uses the length parameter:

<?php
$car = "1944 Ford";
echo substr($car, 0, 4);
?>

This returns the following:

1944

The final example uses a negative length parameter:

<?php
$car = "1944 Ford";
$yr = echo substr($car, 2, -5);
?>

This returns the following:

44

Determining the Frequency of a String’s Appearance

The substr_count() function returns the number of times one string occurs within another. Its prototype follows:

int substr_count(string str, string substring)

The following example determines the number of times an IT consultant uses various buzzwords in his presentation:
<?php
$buzzwords = array("mindshare", "synergy", "space");

$talk = <<< talk
I'm certain that we could dominate mindshare in this space with
our new product, establishing a true synergy between the marketing and product development teams. We'll own this space in three months.
talk;

foreach($buzzwords as $bw) {
echo "The word $bw appears ".substr_count($talk,$bw)." time(s).<br />";
}
?>

This returns the following:

The word mindshare appears 1 time(s). The word synergy appears 1 time(s). The word space appears 2 time(s).

Replacing a Portion of a String with Another String

The substr_replace() function replaces a portion of a string with a replacement string, beginning the substitution at a specified starting position and ending at a predefined replacement length. Its prototype follows:

string substr_replace(string str, string replacement, int start [, int length])

Alternatively, the substitution will stop on the complete placement of replacement in str. There are several behaviors you should keep in mind regarding the values of start and length:

• If start is positive, replacement will begin at character start.

• If start is negative, replacement will begin at str length - start.

• If length is provided and is positive, replacement will be length characters long.

• If length is provided and is negative, replacement will end at str length - length characters.

Suppose you built an e-commerce site and within the user profile interface you want to show just the last four digits of the provided credit card number. This function is ideal for such a task:

<?php
$ccnumber = "1234567899991111";
echo substr_replace($ccnumber,"************",0,12);
?>

This returns the following:

************1111

Padding and Stripping a String

For formatting reasons, you sometimes need to modify the string length via either padding or strip- ping characters. PHP provides a number of functions for doing so. This section examines many of the commonly used functions.

Trimming Characters from the Beginning of a String

The ltrim() function removes various characters from the beginning of a string, including white space, the horizontal tab (\t), newline (\n), carriage return (\r), NULL (), and vertical tab (\x0b). Its prototype follows:

string ltrim(string str [, string charlist])

You can designate other characters for removal by defining them in the optional parameter
charlist.

Trimming Characters from the End of a String

The rtrim() function operates identically to ltrim(), except that it removes the designated characters from the right side of a string. Its prototype follows:

string rtrim(string str [, string charlist])

Trimming Characters from Both Sides of a String

You can think of the trim() function as a combination of ltrim() and rtrim(), except that it removes the designated characters from both sides of a string:

string trim(string str [, string charlist])

Padding a String

The str_pad() function pads a string with a specified number of characters. Its prototype follows:

string str_pad(string str, int length [, string pad_string [, int pad_type]])

If the optional parameter pad_string is not defined, str will be padded with blank spaces; other- wise, it will be padded with the character pattern specified by pad_string. By default, the string will be padded to the right; however, the optional parameter pad_type may be assigned the values STR_PAD_RIGHT, STR_PAD_LEFT, or STR_PAD_BOTH, padding the string accordingly. This example shows how to pad a string using str_pad():

<?php
echo str_pad("Salad", 10)." is good.";
?>

This returns the following:

Salad is good.

This example makes use of str_pad()’s optional parameters:

<?php
$header = "Log Report";
echo str_pad ($header, 20, "=+", STR_PAD_BOTH);
?>

This returns the following:

=+=+=Log Report=+=+=

Note that str_pad() truncates the pattern defined by pad_string if length is reached before completing an entire repetition of the pattern.

Counting Characters and Words

It’s often useful to determine the total number of characters or words in a given string. Although PHP’s considerable capabilities in string parsing has long made this task trivial, two functions were recently added that formalize the process. Both functions are introduced in this section.

Counting the Number of Characters in a String

The function count_chars() offers information regarding the characters found in a string. Its proto- type follows:

mixed count_chars(string str [, mode])

Its behavior depends on how the optional parameter mode is defined:

0: Returns an array consisting of each found byte value as the key and the corresponding frequency as the value, even if the frequency is zero. This is the default.

1: Same as 0, but returns only those byte values with a frequency greater than zero.

2: Same as 0, but returns only those byte values with a frequency of zero.

3: Returns a string containing all located byte values.

4: Returns a string containing all unused byte values.

The following example counts the frequency of each character in $sentence:

<?php
$sentence = "The rain in Spain falls mainly on the plain";

// Retrieve located characters and their corresponding frequency.
$chart = count_chars($sentence, 1);

foreach($chart as $letter=>$frequency)
echo "Character ".chr($letter)." appears $frequency times<br />";
?>

This returns the following:

Character appears 8 times Character S appears 1 times Character T appears 1 times Character a appears 5 times Character e appears 2 times Character f appears 1 times Character h appears 2 times Character i appears 5 times Character l appears 4 times Character m appears 1 times Character n appears 6 times Character o appears 1 times Character p appears 2 times Character r appears 1 times Character s appears 1 times Character t appears 1 times Character y appears 1 times

Counting the Total Number of Words in a String

The function str_word_count() offers information regarding the total number of words found in a string. Its prototype follows:

mixed str_word_count(string str [, int format])

If the optional parameter format is not defined, it will simply return the total number of words. If format is defined, it modifies the function’s behavior based on its value:

1: Returns an array consisting of all words located in str.

2: Returns an associative array, where the key is the numerical position of the word in str, and the value is the word itself.

Consider an example:

<?php
$summary = <<< summary
In the latest installment of the ongoing Developer.com PHP series, I discuss the many improvements and additions to PHP 5's
object-oriented architecture.
summary;
$words = str_word_count($summary);
printf("Total words in summary: %s", $words);
?>

This returns the following:

Total words in summary: 23

You can use this function in conjunction with array_count_values() to determine the frequency in which each word appears within the string:

<?php
$summary = <<< summary
In the latest installment of the ongoing Developer.com PHP series, I discuss the many improvements and additions to PHP 5's
object-oriented architecture. summary;
$words = str_word_count($summary,2);
$frequency = array_count_values($words);
print_r($frequency);
?>

This returns the following:

Array ( [In] => 1 [the] => 3 [latest] => 1 [installment] => 1 [of] => 1 [ongoing] => 1 [Developer] => 1 [com] => 1 [PHP] => 2 [series] => 1
[I] => 1 [discuss] => 1 [many] => 1 [improvements] => 1 [and] => 1 [additions] => 1 [to] => 1 [s] => 1 [object-oriented] => 1 [architecture] => 1 )

Taking Advantage of PEAR: Validate_US

Regardless of whether your Web application is intended for use in banking, medical, IT, retail, or some other industry, chances are that certain data elements will be commonplace. For instance, it’s conceivable you’ll be tasked with inputting and validating a telephone number or a state abbrevia- tion, regardless of whether you’re dealing with a client, a patient, a staff member, or a customer. Such repeatability certainly presents the opportunity to create a library that is capable of handling such matters, regardless of the application. Indeed, because we’re faced with such repeatable tasks, it follows that other programmers are, too. Therefore, it’s always prudent to investigate whether
somebody has already done the hard work for you and made a package available via PEAR.

■Note If you’re unfamiliar with PEAR, take some time to review Chapter 11 before continuing.

Sure enough, a quick PEAR search turns up Validate_US, a package that is capable of validating various informational items specific to the United States. Although still in beta at press time, Validate_US was already capable of syntactically validating phone numbers, SSNs, state abbreviations, and ZIP codes. This section shows you how to install and implement this immensely useful package.

Installing Validate_US

To take advantage of Validate_US, you need to install it. The process for doing so follows:

%>pear install -f Validate_US
WARNING: failed to download pear.php.net/Validate_US within preferred state "stable", will instead download version 0.5.2, stability "beta" downloading Validate_US-0.5.2.tgz ...
Starting to download Validate_US-0.5.2.tgz (6,578 bytes)
.....done: 6,578 bytes
install ok: channel://pear.php.net/Validate_US-0.5.2

Note that because Validate_US is a beta release (at the time of this writing), you need to pass the
-f option to the install command in order to force installation.

Using Validate_US

The Validate_US package is extremely easy to use; simply instantiate the Validate_US() class and call the appropriate validation method. In total there are seven methods, four of which are relevant to this discussion:

phoneNumber(): Validates a phone number, returning TRUE on success, and FALSE otherwise. It accepts phone numbers in a variety of formats, including xxx xxx-xxxx, (xxx) xxx-xxxx, and similar combinations without dashes, parentheses, or spaces. For example, (614)999-9999,
6149999999, and (614)9999999 are all valid, whereas (6149999999, 614-999-9999, and 614999 are not.

postalCode(): Validates a ZIP code, returning TRUE on success, and FALSE otherwise. It accepts ZIP codes in a variety of formats, including xxxxx, xxxxxxxxx, xxxxx-xxxx, and similar combina- tions without the dash. For example, 43210 and 43210-0362 are both valid, whereas 4321 and
4321009999 are not.

region(): Validates a state abbreviation, returning TRUE on success, and FALSE otherwise. It accepts two-letter state abbreviations as supported by the U.S. Postal Service (http://www.usps.com/ ncsc/lookups/usps_abbreviations.html). For example, OH, CA, and NY are all valid, whereas CC, DUI, and BASF are not.

ssn(): Validates an SSN by not only checking the SSN syntax but also reviewing validation infor- mation made available via the Social Security Administration Web site (http://www.ssa.gov/), returning TRUE on success, and FALSE otherwise. It accepts SSNs in a variety of formats, including xxx-xx-xxxx, xxx xx xxx, xxx/xx/xxxx, xxx\txx\txxxx (\t = tab), xxx\nxx\nxxxx (\n = newline), or any nine-digit combination thereof involving dashes, spaces, forward slashes, tabs, or newline characters. For example, 479-35-6432 and 591467543 are valid, whereas 999999999, 777665555, and 45678 are not.

Once you have an understanding of the method definitions, implementation is trivial. For example, suppose you want to validate a phone number. Just include the Validate_US class and call phoneNumber() like so:
<?php
include "Validate/US.php";
$validate = new Validate_US();
echo $validate->phoneNumber("614-999-9999") ? "Valid!" : "Not valid!";
?>

Because phoneNumber() returns a Boolean, in this example the Valid! message will be returned.
Contrast this with supplying 614-876530932 to phoneNumber(), which will inform the user of an invalid phone number.

Summary

Many of the functions introduced in this chapter will be among the most commonly used within your PHP applications, as they form the crux of the language’s string-manipulation capabilities.
In the next chapter, we examine another set of well-worn functions: those devoted to working
with the file and operating system.

0 comments: