Tuesday, July 14, 2009

Advanced PHP for Web Professionals By Christopher Cosentino Chapter 6

Overview
PHP is a very useful language for working with files. Although it may not be as robust as other languages, such as PERL, when it comes to parsing files, PHP still provides great power and flexibility when it comes to working with text files.

This chapter covers reading from, manipulating, and writing to text files. You can use the features covered in this chapter to create applications that can read in text from a file (locally or over the Internet using HTTP or FTP), then use the data gathered in a manner more suited to your needs.
Opening and Closing Files
Opening Files
Before you can use PHP to read a file, you must first open it. To open a file, you use the fopen() function.

$file_handler = fopen($file, $mode)

You must assign the function to a variable, as above. This is referred to as a file pointer. You use the file pointer to reference the open file throughout your script.

The fopen() function takes two arguments:

$file is the name (and path, if required) of the file.

$mode is one of a list of available modes:

r— Open the file as read-only. You cannot write anything to the file. Reading begins at the beginning of the file.

r+ — Open the file for reading and writing. Reading and writing begin at the beginning of the file. You will delete existing contents if you write to the file without first moving the internal pointer to the end of the file.

w— Open the file for write-only. You cannot read data from the file. Writing begins at the beginning of the file. You will again delete contents if you write to the file without first moving the pointer to the end of the file. If the file specified by the $fie argument does not exist, then PHP attempts to create the file. Make sure your permissions are set correctly!

w+ — As above, but the file may also be read.

a— This mode is the same as the 'w' mode, with the exception that the internal pointer is placed at the end of the file. Existing contents will not be overwritten unless you rewind the internal pointer to the beginning of the file.

a+ — As above, but the file may also be read.

Additionally, if the file is a binary file, you must include a "b" in the mode if your OS is Windows. This chapter only deals with ASCII text files, so you will not require the "b: mode in the following example. If you do need to open a binary file, an example is below:

$file_handle = fopen($filename, "rb+")

Note the differences between the "w" and "a" modes. Using "w" to open a file effectively deletes the contents of the file, while using the "a" mode retains the contents of the file and allows you to append additional data to the end of the file.

Once you have the file opened, you can then read from or write to the file.

Reading Files
There are multiple functions available to read from a file.

You can read data from the file using the fread() function:

$file_contents = fread($file_pointer, $length_to_read);

You must assign the value of fread() to a variable, which will contain the data that has been read. The fread() function takes two arguments:

$file_pointer is the file pointer to which you assigned the value from the fopen() function.

$length is the amount of bytes that you wish to read.

When using fread() to read a file, you can read in the entire file. To do this, you need to know the byte-size of the file. You can easily find the total byte-size of a file using the filesize() function:

$bytes = filesize($filename);

The following script demonstrates how to open a short text file and display its contents:

$file = "/path/to/text.txt";
$filepointer = fopen($file, "r");
$contents = fread($filepointer, filesize($file));
echo "<pre>" . htmlentities($contents) . "</pre>";

There is a slight problem when reading files from the Internet: You cannot easily determine the size of the file, since it does not exist on the local server. An easy fix for this is to attempt to read many more bytes than you think exist in the file. Since fread() stops when it reaches the end of the file (EOF), you don't have to worry about PHP wasting processing on the extra bytes you specified for the $length argument in fread(). For example:

$file = "http://www.example.com/text.txt";
$filepointer = fopen($file, "r");
$contents = fread($filepointer, 9999999);

Closing Files
When you have finished reading from or writing to a file that has been opened using the fopen() function, you should always close it using the fclose() function:

fclose($filepointer);

Closing the file allows you to clean up any memory that was used to open it and also prevents possible file corruption that sometimes (rarely) occurs when a file is left open.

Putting It to Use
This next script shows how to open a file from the Internet and display it on your page, as in Figure 6-1. The National Oceanic & Atmospheric Administration (NOAA) provides hourly weather information for almost every airport in the United States. They provide this information on their site at www.noaa.gov in many formats. One weather report format they provide is called METAR. METAR is a French acronym that roughly translates as "Aviation Routine Weather Report." METAR provides useful information like current winds, barometric pressure, cloud cover, etc. This script accesses the files stored on the NOAA's datastore for this information, relative to the airport identifier you provide to the script. If you enter "KJFK" for New York's JFK airport, it provides the current weather in the encoded (shorthand for aviation weather reporting) and decoded (readable by us normal people) formats.

Script 6–1 metar.php
[View full width]
1. <body bgcolor="#FFFFFF" text="#000000">
2. <h2>METAR Weather Retrieval</h2>
3. <p>This page retrieves the latest METAR information for the requested station.
4. <p>METAR stations are generally located at all major airports and many smaller ones.
Station identifiers are generally the letter "K" followed by the airport identifier. For
example, the station identifier for Logan International at Boston, MA is "KBOS".
5. <p>The information is retrieved directly from the NOAA Weather Database (see <i>http://
weather.noaa.gov/weather/metar.shtml</i>).
6. <p>Various METAR-related information can be found at <a href="http://205.156.54.206/oso/
oso1/oso12/metar.htm">http://205.156.54.206/oso/oso1/oso12/metar.htm</a>.
7. <p><strong>Enter the Station Identifier below:</strong>
8. <p>
9. <form action=metar.php mehod=get>
10. <p>METAR Station ID: <input type="text" name="metar_station" size="4" maxlength="4">
11. <input type="submit" name="submit" value="Submit">
12. </form>
13. <hr noshade>
14. <?
15. function connect_error($metar_station) {
16. ?>
17. <p><span>Sorry, either that station identifier, <strong><? echo strtoupper(
$metar_station) ?></strong>, is not valid, or the connection to the NOAA servers is
broken.</span>
18. <?
19. die();
20. }
21. if(isset($metar_station)) {
22. $file1 = "ftp://weather.noaa.gov/data/observations/metar/decoded/" . strtoupper(
$metar_station) . ".TXT";
23. $file2 = "ftp://weather.noaa.gov/data/observations/metar/stations/" . strtoupper(
$metar_station) . ".TXT";
24. if(!$fd1 = @fopen($file1, "rb")) {
25. connect_error($metar_station);
26. } else {
27. $decoded = fread($fd1, 9999999);
28. printf("<h3>METAR Decoded Observation for <span>%s</span></h3><pre>%s</
pre>", strtoupper($metar_station), htmlentities($decoded));
29. fclose($fd1);
30. }
31. if(!$fd2 = @fopen($file2, "rb")) {
32. connect_error($metar_station);
33. } else {
34. $encoded = fread($fd2, 9999999);
35. printf("<h3>METAR Encoded Observation for <span>%s</span></h3><pre>%s</
pre>", strtoupper($metar_station), htmlentities($encoded));
36. fclose($fd2);
37. }
38. }
39. ?>
40. </body>
41. </html>

Figure 6-1. metar.php

Script 6-1. metar.php Line-by-Line Explanation LINE
DESCRIPTION

1–7
Provide some basic HTML for the beginning of the page, with a style sheet and some instructions on how to use the script.

9–13
Provide a form asking for the airport station identifier.

14
Begin the PHP part of the page.

15–20
Define a function, connect_error(), to print out an error message in the event that the connection to the NOAA server is not working or an invalid station identifier was entered by the user. This function takes one argument, $metar_station, which is the station identifier entered by the user into the form.

21
If the $metar_station variable is set, then we know that the user has submitted the form, and can continue to the next line. If $metar_station has not been set, then do nothing.

22
Assign the decoded METAR file from the NOAA server to the $file1 variable. The files reside on the server with the format METAR_STATION.TXT—for example, "KBOS.TXT" for Logan airport in Boston, MA. The strtoupper() function makes the user's text all uppercase regardless of what case it was when the user entered it.

23
The same as above, only this time open the encoded version of the data and assign it to the $file2 variable.

24–25
Attempt to open the file on the NOAA server. If the file cannot be opened, then throw an error to the connect_error() function to notify the user.

26
If there were no errors opening the file, then execute lines 27–30.

27
Read in the entire contents of the file (we are assuming it is less than 9,999,999 bytes) and assign it to the $decoded variable.

28
Print out the data to the page. We use htmlentities() to translate any data that may be recognized as HTML code into the HTML code for that data.

29
Close the file using the fclose() function.

30
Close out the if statement started on line 26.

31–32
Attempt to open the other file on the NOAA server. If the file cannot be opened, then throw an error to the connect_error() function to notify the user.

33
If there were no errors opening the file, then execute lines 34–37.

34
Read in the entire contents of the file (we are assuming it is less than 9,999,999 bytes) and assign it to the $decoded variable.

35
Print out the data to the page. We use htmlentities() to translate any data that may be recognized as HTML code into the HTML code for that data.

36
Close the file using the fclose() function.

37
Close out the if statement started on line 33.

38
Close out the if statement started on line 31.

39–41
End the PHP and HTML for the script.

Data Mining
Many Web pages provide a lot of useful information. Sometimes the information you need is surrounded by a lot of other information that you don't need or want. Using PHP, it is possible to read a page on the Web and extract the information you need.

When extracting information from other sites, it is important to remember that any data on another site is most likely copyrighted. Do not use other people's data for your own purposes without express permission from the owner of that data.

This next example details how to retrieve a stock quote from the Reuters News Service Web page. This is merely an example and should not be used on your own site unless you have expressly written consent from Reuters to use their data.

HTML is a rather poor language to specify the type of data contained on a page. It works well for formatting data, but when it comes to being able to find the data you need, you sometimes have to get a little creative.

Before you can extract the data you require, you must look at the data in its "natural" state. In the example below, the data is fetched from the Reuters page at www.reuters.com/quote.jhtml after you search for a quote. Looking at the HTML for the page, you will notice that the quote prices are one line below the "Bid" and "Ask" lines, which are unique to the page. Therefore, you can extract the prices by searching for the unique text.

This script uses another method of reading a file, called the file() function:

$file = "/home/me/myfile.txt";
$filearray = file($file);

The file() function opens the file, reads the file contents into an array with each line in the file corresponding to an array element, and then closes the file. file() takes one argument, the name of the file you wish to open, including the path if applicable.

Once again, remember to respect copyright notices when using scripts that mine Web pages for data! The result of this script can be seen in Figure 6-2.

Script 6–2 stockquote.php
1. <html>
2. <head><title>Stock Quote Mining</title></head>
3. <body>
4. <form action="stockquote.php">
5. <p>Enter Symbol For Quote: <input type="text" name="symbol">
6. <br><input type="submit" name="submit" value="Fetch Quote">
7. </form>
8. <hr noshade>
9. <?
10. if(isset($symbol)) {
11. $file = "http://www.reuters.com/quote.jhtml?ticker=";
12. $contents = file($file . $symbol);
13. $size = sizeof($contents);
14. echo "<h3>Quote for $symbol:</h3>";
15. for($i = 0; $i < $size; $i++) {
16. $data = strip_tags($contents[$i]);
17. if(trim($data) == "Ask") {
18. $quote = strip_tags($contents[$i + 1]);
19. echo "<P>Ask: $quote";
20. }
21. if(trim($data) == "Bid") {
22. $quote = strip_tags($contents[$i + 1]);
23. echo "<P>Bid: $quote";
24. }
25. }
26. }
27. ?>
28. </body>
29. </html>

Figure 6-2. stockquote.php

Script 6-2. stockquote.php Line-by-Line Explanation LINE
DESCRIPTION

1–8
Print out the beginning HTML for the page, as well as a form asking for the symbol of the stock for which the user wishes to retrieve a quote.

9
Begin the PHP for the page.

10
Check to see if the $symbol variable has been set. If it has, then execute lines 11–26.

11
Assign the $file variable to the page that has stock quote information.

12
Assign the data opened from the file() function to the $contents variable. The file() function's argument ($file . $symbol) is the URL appended with the stock quote symbol entered by the user. For example, if the user entered "CSCO" (the symbol for Cisco Systems), then the argument is "http://www.reuters.com/quote.jhtml?ticker=CSCO".

13
Get the size of the $contents array. This is used below to loop through each record in the array.

14
Print out a heading.

15–25
Use a for loop to loop through each record in the array.

16
Strip out the HTML tags from the record retrieved from the array and assign this value to the $data variable.

17
If, after trimming whitespace from $data, the resulting value is "Ask", then we know, after analyzing the page we wish to mine, that the value we are looking for is on the next line. Execute lines 18–20.

18
Get the "Ask" quote from the next line in the array ($contents[$i +1]) and assign it to the quote variable.

19
Echo the "Ask" quote to the page.

20
Close out the if statement started on line 17.

21
If, after trimming whitespace from $data, the resulting value is "Bid", then we know, after analyzing the page we wish to mine, that the value we are looking for is on the next line. Execute lines 22–24.

22
Get the "Bid" quote from the next line in the array ($contents[$i +1]) and assign it to the quote variable.

23
Echo the "Bid" quote to the page.

24
Close out the if statement started on line 21.

25
Close out the for loop started on line 15

26
Close out the if statement started on line 10.

27–29
End the PHP and HTML for the page.

Parsing Large Files
The method above works well for relatively simple files, but what if you need to parse a large file with a basic structure that has a few subtle variations? One method is to use the ereg() function[1] to seek out patterns of data.

[1] See Chapter 4 for details on the ereg() function.

This next script parses a large text file and places its contents into a database. The file is a list of 789 multiple-choice questions that appear on the FAA's private-pilot written exam.[2]

[2] The questions are not under copyright and are freely available.

The document follows a basic structure, in that it has numbered questions followed by three possible answers. Some of the questions reference a figure, while other questions reference multiple figures. The goal of the script is to parse the text file and place each of the questions into a MySQL database, along with the corresponding choices for the answers. If there is a figure, or figures, that is referenced by the question, then the script also places it in the database.

The document is available on the Internet at av-info.faa.gov/data/airmanknowledge/par.txt. The document is very large, and the structure of some questions is slightly different, especially when referenced figures are mentioned.

The script takes the questions and places all of the information in a database, so that it is easier to manage the data and port it into a format usable by other applications.

This script makes use of another regular expression function called preg_split(). preg_split() is able to break up a string into an array, similar to the commonly used explode() function. Instead of breaking up the string based on a character, it breaks up the string based on a regular expression match.

Look at the file at av-info.faa.gov/data/airmanknowledge/par.txt to become familiar with the general struture of the data before reading over the script descriptions. Things will seem much more clear.

The script is comprised of three files:

parsefile.sql— The SQL statements required to create and set up the database.

parsefile.php— The PHP script that parses the text file and places the data into a database.

displaydata.php— The PHP script that reads the data from the database and displays it to the user. This script is just an example of what you can do with the data once it has been placed in the database.

Script 6–3 parsefile.sql
1. //mysqladmin create faa
2.
3. grant ALL on faa.* to php@localhost identified by "password";
4.
5. create table questions(
6. id INT NOT NULL,
7. code varchar(8),
8. question TEXT,
9. primary key(id));
10.
11. create table answers(
12. question INT NOT NULL,
13. letter CHAR(1),
14. answer TEXT);
15.
16. create table figures(
17. question INT NOT NULL,
18. figure VARCHAR(3),
19. additional VARCHAR(24));
20.
21. create table info(
22. id INT NOT NULL,
23. version VARCHAR(8),
24. date VARCHAR(10),
25. bank VARCHAR(16));

Script 6-3. parsefile.sql Line-by-Line Explanation LINE
DESCRIPTION

1
Use the mysqladmin command line program to create a database called "faa".

3
Grant the required privileges so that a user can access the database using the username "php" and the password "password".

5–9
Create a table to hold the text of the questions.

11–14
Create a table to hold the answers. There are three answers for each question.

16–19
Create a table to hold any figure references contained in a question.

21–25
Create a table to hold the version and date information for the question bank.

Script 6–4 parsefile.php
[View full width]
1. <?
2. function connect() {
3. if(!mysql_connect("localhost","php","password")) {
4. die("<h1>Could Not Connect To MySQL Server</h1>");
5. } elseif(!mysql_select_db("faa")) {
6. die("<h1>Could Not Access DB</h1>");
7. }
8. return 1;
9. }
10.
11. function parse($data) {
12. global $question_num;
13. //IF REFERRING TO A FIGURE, account for inconsistent way the FAA labels these
14. if(ereg("(\(Refer to figure )([0-9]+)(\.\)|(\)\.))", $data, $regs)) {
15. $data2 = preg_split("/(\(Refer to figure )([0-9]+)(\.\)|(\)\.))/", $data);
16. sql_query("insert into figures values('$question_num','" . $regs[2] . "', NULL)");
17. return trim($data2[1]);
18. //IF REFERRING TO MULTIPLE FIGURES, account for inconsistent way FAA labels these
19. } elseif (ereg("(\(Refer to figures |\(Refer to figure )([0-9]+|[0-9]+) (, and |and
figure )([0-9]+)(\.)(\))", $data, $regs)) {
20. $data2 = preg_split("/(\(Refer to figures |\(Refer to figure )([0-9]+|[0-9]+) (, and
|and figure )([0-9]+)(\.)(\))/", $data);
21. sql_query("insert into figures values('$question_num','" . $regs[2] . "', NULL)");
22. sql_query("insert into figures values('$question_num','" . $regs[4] . "', NULL)");
23. return trim($data2[1]);
24. //IF REFERRING TO A FIGURE WITH AREA(S), account for inconsistent way FAA labels these
25. } elseif(ereg("(\(Refer to figure )([0-9]+)(, | )([a-zA-Z]+)( [ a-zA-Z0-9]+)(\.)(\))",
$data, $regs)) {
26. $data2 = preg_split("/(\(Refer to figure )([0-9]+)(, | )([a-zA-Z]+)( [ a-zA-Z0-9]+)(\.)(\
))/", $data);
27. sql_query("insert into figures values('$question_num','" . $regs[2] . "', 'area " .
$regs[5] . "')");
28. return trim($data2[1]);
29. //FUNKY SEMICOLON AND MULTIPLE FIGURES AREAS SCREWING EVERYTHING UP...
30. } elseif(ereg("(\(Refer to figure )([0-9]+)(, | )([a-zA-Z]+)( [ a-zA-Z0-9]+)(; and
figure )([0-9]+)(\.)(\))", $data, $regs)) {
31. $data2 = preg_split("/(\(Refer to figure )([0-9]+)(, | )([a-zA-Z]+)( [ a-zA-Z0-9]+)(;
and figure )([0-9]+)(\.)(\))/", $data);
32. sql_query("insert into figures values('$question_num','" . $regs[2] . "', 'area " .
$regs[5] . "')");
33. sql_query("insert into figures values('$question_num','" . $regs[7] . "', NULL)");
34. return trim($data2[1]);
35. //EVERYTHING ELSE
36. } else {
37. return $data;;
38. }
39. }
40.
41. function sql_query($query){
42. if(!mysql_query($query)) {
43. die("<h1>Error with QUERY: $query</h1>");
44. }
45. //echo "<P>$query";
46. }
47.
48. /***** MAIN *****/
49. //$file = "http://av-info.faa.gov/data/airmanknowledge/par.txt";
50. $file = "par.txt";
51. $contents = file($file);
52. $size = sizeof($contents);
53. $questions = "not started";
54.
55. $answer_let = "A";
56. $question = "";
57.
58. connect();
59. mysql_query("delete from questions");
60. mysql_query("delete from answers");
61. mysql_query("delete from figures");
62. mysql_query("delete from info");
63. insert into info values ('1',NULL,NULL,NULL);
64. for($i = 0; $i < $size; $i++) {
65. if(trim($contents[$i]) != ""){
66. $data = $contents[$i];
67. if(ereg("^Version", $data)) {
68. $versioninfo = explode(":", $data);
69. $version = trim($versioninfo[1]);
70. sql_query("update info set version = '$version' where id = '1'");
71. } elseif(ereg("[0-9]+/[0-9]+/[0-9]+", $data)) {
72. $date = trim($data);
73. sql_query("update info set date = '$date' where id = '1'");
74. } elseif(ereg("^Bank", $data)) {
75. $bankinfo = explode(":", $data);
76. $bank = trim($bankinfo[1]);
77. sql_query("update info set bank = '$bank' where id = '1'");
78. } elseif(ereg("^[0-9]+\.", $data)) {
79. if($questions == "started") {
80. sql_query("update questions set question = '" . addslashes($question) . "' where id =
'$question_num'");
81. $question = "";
82. }
83. $questions = "started";
84. $data2 = preg_split("/\s+/", $data);
85. $question_num = rtrim($data2[0], ".");
86. $code = $data2[1];
87. sql_query("insert into questions values('$question_num', '$code', NULL)");
88. }elseif(ereg("(^A\.)|(^B\.)|(^C\.)", $data)) {
89. $answer = preg_split("/(^A\.)|(^B\.)|(^C\.)/", $data);
90. sql_query("insert into answers values ('$question_num','$answer_let','" . addslashes(
$answer[1]) . "')");
91. $answer_let++;
92. if($answer_let == "D"){
93. $answer_let = "A";
94. }
95. } else {
96. if($questions == "started") {
97. $question .= parse($data);
98. }
99. }
100. }
101. }
102. echo "Operation Completed with No Errors";
103. ?>

Script 6-4. parsefile.php Line-by-Line Explanation LINE
DESCRIPTION

2–9
Define a function, connect(), that is used to establish the connection to the database. Replace the mysql_connect variables with those for your MySQL server.

11–39
Define a function, parse(), that parses the data to extract figure references. Since we may want to hyperlink the figure references to actual figures, we need to parse the data using the ereg() function to look for these figure references in the text. This function takes one argument, $data, which is data that has been read by the main program and established as question data.

12
Allow the $question_num variable to be read by this function.

13
Provide a comment explaining the case that is to follow.

14
If $data matches the ereg() expression, then execute lines 15–17. If not, go on to line 19.

This line tries to match the data that has a reference to a figure. Since the text is inconsistent in how it labels figures, we need to use a regular expression match that takes into account several variances found in the text. The regular expression matches against the following (where # is any range of numerals):

(Refer to figure #.)

(Refer to figure #)

(Refer to figure #).

Notice also that we use the $regs argument to the ereg() function. The $regs argument, when used with ereg(), causes ereg() to return any matches that occur in the parentheses of the ereg search as items in the $regs array. For example, if the text was "(Refer to Figure 8). How many licks does it take to get to the center…", then ereg() would return the following:

$regs[0] = "(Refer to Figure 8.) How many licks does it take…" //the complete match

$regs[1] = "(Refer to figure "

$regs[2] = "8"

$regs[3] = ".)"

Nothing else is returned, because we did not specify anything else within parentheses in the ereg() expression.

15
Since the ereg() function matched on line 14, this line separates the figure reference from the actual question. The preg_split() function, as noted earlier, works like explode(). In this case, it takes the match above and "explodes" it so that $data is broken into two parts. The first part is the figure reference, and the second part is the actual question. If we follow the example of $data being "(Refer to Figure 8.) How many licks does it take…", then this line would create an array with two items:

$data2[0] = "(Refer to Figure 8.)"

$data2[1] = "How many licks does it take…"

16
Place the figure reference into the database using the $question_num variable as a pseudo-foreign key.

17
Return the actual questions, sans the figure reference, back to the main program. Since file() breaks the text file into an array based on each line, it is possible that the question spans more than one line, and that a line is only a fragment of the question. The main program concatenates the question parts from each line into a full question and later places the full question into the database.

18
Provide a comment explaining the case that is to follow.

19
This case is much the same as the previous, only it tries to match against questions that have multiple figure references, for example:

(Refer to figures # and #)

(Refer to figure # and figure #)

(Refer to figures #, and #)

Since the text has, at most, two figures per question, we need to extract both of the figure numbers to place in the database. Again we use the $regs argument to ereg(), and by carefully grouping the regular expression with parentheses, we know that the figure number will be found in the second and fourth groupings, $regs[2] and $regs[4].

20
As before, we need to split the text to remove the figure reference from the actual question, using the same regular expression as in the previous line for the split.

21–22
Place the multiple figure references in the database.

23
Return the actual question portion of the data back to the main program for further processing.

24
Provide a comment explaining the case that is to follow.

25
If you looked at the text file, you may have noticed that some questions with figure references also have a specific area of the figure that was to be referenced. This ereg() function takes these area references into account.

26
Split the text to extract the actual question portion.

27
Place the figure reference into the database, along with the area reference to the figure.

28
Return the question to the main program.

29
Provide a colorful comment explaining the case that is to follow.

30
Finally, we take into account multiple figures that include areas, with an odd semicolon or comma thrown in for good measure. This ereg() function matches against text such as the following:

(Refer to figure #, area A; and figure #)

(Refer to figure # area #, and figure #)

(Refer to figure # area A, and #)

31
Once again, split the text to extract the actual question.

32–33
Place the figure/area references into the database.

34
Return the actual question to the main program.

35
Provide a comment explaining the case that is to follow.

36–38
If there is no figure reference, then we assume that the remaining data is just lines of the question and return it to the main program to be appended to the other parts of the question that have already been sent.

39
End the function declaration.

41–46
Define a function, sql_query(), that attempts to send the query to the database. If there is an error, the script is killed. There is an echo line that is commented out. Uncomment this line if you'd like the query statements to be printed to the browser.

48
Begin the main program.

49–50
Define the $file variable that will be used to open the file. Each of these lines does the same thing. If you copied the file to your local machine, then uncomment line 50 and comment line 49. If you have a fast connection to the Internet, do the opposite.

51
Use the file() function to read the contents of $file into an array.

52
Determine how many lines are in the file by seeing how many records are in the array.

53
Define a variable, $questions, and assign it the value "not started". This is done because there is some extraneous information at the beginning of the file that must be parsed over before we can start parsing the questions themselves.

55
Define a variable, $answer_let, and assign it the value of "A". This is used to track the letter of the current answer (A, B, or C).

56
Define a variable, $question, and give it a blank value. This will be used to hold and append the question fragments as they are read from the file.

58
Establish a connection to the database using the connect() function.

59–62
Execute queries to delete the contents of the database. Since the database is only designed to hold one version of questions at a time, you must ensure that there is no existing data in the database.

63
Insert a "default" row into the info table as a placeholder.

64–102
Loop through each line in the text file and parse it.

65
If the line is not blank, then continue. Otherwise, go to line 101.

66
Assign the value of the current line ($contents[$i]) to the $data variable to save some typing.

67–70
If the line starts with "Version", then explode the line on the ":" character, trim any whitespace, and place the version number of the test bank into the database by sending the SQL to the sql_query() function.

71–73
If the line contains a date format—for example, "6/11/2002"—then trim the whitespace around the text and place it into the database.

74–77
If the line starts with "Bank", then explode the line on the ":" character, trim any whitespace, and place the bank information in the database.

78
If the line begins with a number immediately followed by a period, then we know, from studying the text, that it is a question number. Continue through line 87.

79–82
If the $question variable has been set to started, then we know we have some question data already assembled. Place that question in the database and clear the text from our $question variable, since we will be placing a new question in that variable.

83
Set the $questions variable to started (it may already be set that way, but if this is the first question encountered, it has not been).

84
Split the current line up along any whitespace. The initial line for a question looks like this:

"1.
A01
PVT"

This breaks the line into three array elements and assigns it to the $data2 variable.

85
Get the question number from the $data2 array, trim off the trailing period, and assign it to the $question_num variable.

86
Get the question code from the $data2 variable and assign it to the $code variable.

87
Insert the question number and code into the database. We leave the actual question part out at this time and only place a NULL into the database in its place. A little later, we'll put the complete question in, since it spans multiple lines.

88
If the line begins with an "A.", "B.", or "C.", then we know that this is one of the multiple choice answers to the question. Continue to line 95.

89
Split the answer letter from the actual answer by using the preg_split() function.

90
Insert the answer into the database using the question number as a foreign key. Use the $answer_let variable as the answer letter.

91
Increment the $answer_let variable, since we know the next answer letter is going to be one more than the current.

92–94
If the $answer_let variable is "D", then set it back to "A", since we know each question only has three possible answers, "A", "B", or "C".

95–99
If none of the above has matched, then we know we have an actual question, since it is not preceded by any of the above matches. Send this part of the text to be parsed by the parse() function.

100
End the if statement started on line 65.

101
End the for loop started on line 64.

102
If we've made it this far without the script dying, then we know that no errors have occurred. Print out a message telling yourself so!

103
End the PHP for the script. Now run displaydata.php to see the result of your labor.

This next script, shown in Figure 6-3, merely displays what we have put in the database. It's just a quick example to show you how thoroughly parsefile.php parsed the large text document.

Script 6–5 displaydata.php
[View full width]
1. <html>
2. <head><title>Display Parsed File From Database</title>
3. <style type=text/css>
4. h1, h2, h3, p, td, a {font-family: verdana, arial, helvetica, sans-serif;}
5. .navy {color: navy; }
6. </style>
7. </head>
8. <body>
9. <?
10.
11. function connect() {
12. if(!mysql_connect("localhost","php","password")) {
13. die("<h1>Could not connect to MySQL Server</h1>");
14. } elseif(!mysql_select_db("faa")) {
15. die("<h1>Could Not Access DB</h1>");
16. }
17. return 1;
18. }
19.
20. function display($sql) {
21. $color = "#FFFFFF";
22. connect();
23. $result = mysql_query($sql);
24. while($row = mysql_fetch_array($result)) {
25. $question_num = $row['id'];
26. echo "<tr bgcolor=\"$color\"><td>" . $question_num . "</td><td><pre>" .
stripslashes($row['question'] . "</pre>");
27. $sql2 = "select * from figures where question = '$question_num' order by figure";
28. $result2 = mysql_query($sql2);
29. if(mysql_num_rows($result2) > 0) {
30. while($row2 = mysql_fetch_array($result2)) {
31. echo "<br>See Figure: " . $row2['figure'] . " " . $row2['additional'];
32. }
33. }
34. echo "</td></tr>";
35. $sql3 = "select * from answers where question = '$question_num' order by letter";
36. $result3 = mysql_query($sql3);
37. while($row3 = mysql_fetch_array($result3)) {
38. echo "<tr bgcolor=\"$color\"><td colspan=2>";
39. echo $row3['letter'] . ". " . stripslashes($row3['answer']);
40. echo "</td></tr>";
41. }
42. if($color == "#FFFFFF") {
43. $color = "#FFFF80";
44. } else {
45. $color= "#FFFFFF";
46. }
47. echo "</td></tr>";
48. flush();
49. }
50. }
51. ?>
52. <table border=1 cellspacing=0 cellpadding=5>
53. <?
54. if(!isset($next)) {
55. $sql = "select * from questions limit 0, 200";
56. $next = 0;
57. } else {
58. $sql = "select * from questions order by id limit $next, 200";
59. }
60. display($sql);
61. $next += 200;
62. ?>
63. </table>
64. <p><a href="displaydata.php?next=<?=$next?>">Next 200 Questions</a>
65. </body>
66. </html>

Figure 6-3. displaydata.php

Script 6-5. displaydata.php Line-by-Line Explanation LINE
DESCRIPTION

1–9
Print out the beginning HTML for the script, including some style sheet formatting.

11–18
Define a function, connect(), to connect to the database.

20–50
Define a function, display(), that displays the data in the database. The display() function takes one argument, $sql, which is the initial SQL statement used to display a set number of questions.

21
Initialize the $color variable. This is used to alternate colors for each of the questions to make the page more readable.

22
Connect to the database using the connect() function.

23
Generate an SQL result using the $sql variable that was passed to the function.

24–34
Display the question and any figure references associated with the question.

35–41
Display the answers to the associated question.

42–46
Change the $color variable so that it is the opposite of the current color.

47
Close out the row.

48
Flush content to the browser so a timeout does not occur.

49
End the while loop started on line 24.

50
End the function declaration.

51
Get out of PHP mode for a moment.

52
Print out the beginning of a table.

53
Get back into PHP mode.

54–56
If the $next variable has not been set, then create an SQL statement to display the first 200 questions in the test bank and set $next to ")" to initialize it.

57–59
If the $next variable has been set, then we know the user has pushed the "Next…" link at the bottom of the page. Create an SQL statement to show the next 200 questions, depending on what questions they have already seen.

60
Execute the display() function using the generated SQL from the previous lines.

61
Increment the $next variable.

62
Get out of PHP mode again.

63
Close the table.

64
Print a link to display the next 200 questions in the bank.

65–66
Close out the HTML for the page.

0 comments: