Wednesday, August 12, 2009

KEEPING IT FRESH

There’s nothing quite like the excitement of discovering a new and interesting web- site. But this enthusiasm can quickly wane if,
after a few visits, the content of the site hasn’t
changed at all. The primary way of adding new content to a website is by using dynamic, database-driven pages. That’s why we’ve spent so much time discussing MySQL (and will later spend some time on SQLite). Another ideal way of keeping a site current and inter- esting is by using Rich Site Summary (RSS) feeds. RSS is a file format for web syndication that is widely used by various newsgroups but more commonly encountered in the form of a blog. An RSS file is an Extensible Markup Language (XML) formatted file that can be read using the SimpleXML extension to PHP 5. All you need in order to read an RSS feed is a little knowledge of how an RSS file is structured and an understanding of object-
oriented programming (OOP). You’ll be surprised at just how easy it is once you’ve grasped a few basics of XML.

The downside to having a large website with numerous pages is that it can be difficult for casual web surfers to find what they’re looking for. For this reason I will also show you how to create a site-specific search. I’ll do this using the Google Application Programming Interface (API) and the Simple Object Access Protocol (SOAP) extension to PHP. The Google API will allow us to tap into Google’s search capabilities programmatically using the SOAP web service protocol. This protocol uses XML files over HTTP, so some familiarity with XML is required. If you don’t know anything about XML, don’t worry. You’ll learn enough to get you started, and besides, you already know HTML so you’re well on your way to understanding XML.
In this chapter you’ll also have the opportunity to see how asynchronous JavaScript and XML (AJAX) can work in unison with PHP. We’ll use AJAX to insert the Google search results, thus avoiding having to refresh the entire page. In situations where a page reload is overkill, using AJAX can greatly simplify the user interface to a website (though, of course, improper use can do the exact opposite).
The object-oriented (OO) programmer is ideally placed to program using SimpleXML and SOAP because, as you’ll see, both extensions are entirely object-oriented. Like it or not, knowledge of OOP is a requirement for taking full advantage of these and many other extensions to PHP.

SimpleXML

In PHP 5 all XML support is now provided by the libxml2 XML toolkit. By default PHP 5 supports SimpleXML, but if libxml2 is not installed on your machine or the version number is lower than 2.5.10, go to www.xmlsoft.org and download the latest version. (You can use the PHP function phpinfo to check which version of libxml is running on your server.) Without going into too many details, suffice it to say that support for XML has been brought into line with the standards defined by the World Wide Web Consortium (W3C). Unified treatment of XML under libxml2 makes for a more efficient and more easily maintained implementation of XML support.
Support for XML is much improved in PHP 5, in terms of both perfor- mance and functionality. The SimpleXML extension makes full use of the libxml2 toolkit to provide easy access to XML, and as a quick way of converting XML documents to PHP data types.

XML

Since an RSS document is an XML document, you need some understanding of the basics of XML if you want to be able to read a feed. XML is a markup language that is similar in many ways to HTML—this should come as no sur- prise given that both HTML and XML have a common heritage in Standard Generalized Markup Language (SGML). As a web developer, even if you have never seen an XML file before, it will look familiar, especially if you are coding to the XHTML standard. XML makes use of tags or elements enclosed by angle brackets. Just as in HTML, a closing tag is differentiated from an opening tag by preceding the element name with a forward slash. Also like

HTML, tags can have attributes. The major difference between XML tags and HTML tags is that HTML tags are predefined; in XML you can define your own tags. It is this capability that puts the “extensible” in XML. The best way to understand XML is by examining an XML document. Before doing so, let me say a few words about RSS documents.

RSS

Unfortunately there are numerous versions of RSS. Let’s take a pragmatic approach and ignore the details of RSS’s tortuous history. With something new it’s always best to start with a simple example, and the simplest version of RSS is version 0.91. This version has officially been declared obsolete, but it is still widely used, and knowledge of its structure provides a firm basis for migrating to version 2.0, so your efforts will not be wasted. I’ll show you an example of a version 0.91 RSS file—in fact, it is the very RSS feed that we are going to use to display news items in a web page.

Structure of an RSS File

As we have done earlier with our own code, let’s walk through the RSS code, commenting where appropriate.
The very first component of an XML file is the version declaration. This declaration shows a version number and, like the following example, may also contain information about character encoding.

<?xml version="1.0" encoding="iso-8859-1"?>

After the XML version declaration, the next line of code begins the very first element of the document. The name of this element defines the type of XML document. For this reason, this element is known as the document element or root element. Not surprisingly, our document type is RSS. This opening ele- ment defines the RSS version number and has a matching closing tag that terminates the document in much the same way that <html> and </html> open and close a web page.

<rss version="0.91">

A properly formatted RSS document requires a single channel element. This element will contain metadata about the feed as well as the actual data that makes up the feed. A channel element has three required sub-elements:
a title, a link, and a description. In our code we will extract the channel title
element to form a header for our web page.

<channel>
<title>About Classical Music</title>
<link>http://classicalmusic.about.com/</link>
<description>Get the latest headlines from the About.com Classical Music
Guide Site.</description>

The language, pubDate, and image sub-elements all contain optional meta- data about the channel.

<language>en-us</language>
<pubDate>Sun, 19 March 2006 21:25:29 -0500</pubDate>
<image>
<title>About.com</title>
<url>http://z.about.com/d/lg/rss.gif</url>
<link>http://about.com/</link>
<width>88</width>
<height>31</height>
</image>

The item element that follows is what we are really interested in. The three required elements of an item are the ones that appear here: the title, link, and description. This is the part of the RSS feed that will form the content of our web page. We’ll create an HTML anchor tag using the title and link ele- ments, and follow this with the description.

<item>
<title>And the Oscar goes to...</title>
<link>http://classicalmusic.about.com/b/a/249503.htm</link>
<description>Find out who won this year's Oscar for Best Music...
</description>
</item>

Only one item is shown here, but any number may appear. It is common to find about 20 items in a typical RSS feed.

</channel>
</rss>

Termination of the channel element is followed by the termination of the rss element. These tags are properly nested one within the other, and each tag has a matching end tag, so we may say that this XML document is well- formed.

Reading the Feed

In order to read this feed we’ll pass its URI to the simplexml_load_file func- tion and create a SimpleXMLElement object. This object has four built-in methods and as many properties or data members as its XML source file.

<?php
//point to an xml file
$feed = "http://z.about.com/6/g/classicalmusic/b/index.xml";
//create object of SimpleXMLElement class
$sxml = simplexml_load_file($feed);

We can use the attributes method to extract the RSS version number from the root element.

foreach ($sxml->attributes() as $key => $value){
echo "RSS $key $value";
}

The channel title can be referenced in an OO fashion as a nested prop- erty. Please note, however, that we cannot reference $sxml->channel->title from within quotation marks because it is a complex expression. Alternate syntax using curly braces is shown in the comment below.

echo "<h2>" . $sxml->channel->title . "</h2>\n";
//below won't work
//echo "<h2>$sxml->channel->title</h2>\n";
//may use the syntax below
//echo "<h2>{$sxml->channel->title}</h2>\n";echo "<p>\n";

As you might expect, a SimpleXMLElement supports iteration.

//iterate through items as though an array foreach ($sxml->channel->item as $item){
$strtemp = "<a href=\"$item->link\">".
"$item->title</a> $item->description<br /><br />\n";
echo $strtemp;
}
?>
</p>

I told you it was going to be easy, but I’ll bet you didn’t expect so few lines of code. With only a basic understanding of the structure of an RSS file we were able to embed an RSS feed into a web page.
The SimpleXML extension excels in circumstances such as this where the file structure is known beforehand. We know we are dealing with an RSS file, and we know that if the file is well-formed it must contain certain elements. On the other hand, if we don’t know the file format we’re dealing with, the SimpleXML extension won’t be able to do the job. A SimpleXMLElement cannot query an XML file in order to determine its structure. Living up to its name, SimpleXML is the easiest XML extension to use. For more complex interac- tions with XML files you’ll have to use the Document Object Model (DOM) or the Simple API for XML (SAX) extensions. In any case, by providing the SimpleXML extension, PHP 5 has stayed true to its origins and provided an easy way to perform what might otherwise be a fairly complex task.

Site-Specific Search

In this portion of the chapter we are going to use the Google API and the SOAP extension to create a site-specific search engine. Instead of creating our own index, we’ll use the one created by Google. We’ll access it via the SOAP protocol. Obviously, this kind of search engine can only be imple- mented for a site that has been indexed by Google.

Google API

API stands for Application Programming Interface—and is the means for tapping into the Google search engine and performing searches program- matically. You’ll need a license key in order to use the Google API, so go to www.google.com/apis and create a Google account. This license key will
allow you to initiate up to 1,000 programmatic searches per day. Depending on the nature of your website, this should be more than adequate. As a gen- eral rule, if you are getting fewer than 5,000 visits per day then you are unlikely to exceed this number of searches.
When you get your license key, you should also download the API devel- oper’s kit. We won’t be using it here, but you might want to take a look at it. This kit contains the XML description of the search service in the Web Service Definition Language (WSDL) file and a copy of the file APIs_Reference.html.
If you plan to make extensive use of the Google API, then the information in the reference file is invaluable. Among other things, it shows the legal values for a language-specific search, and it details some of the API’s limitations. For instance, unlike a search initiated at Google’s site, the maximum number of words an API query may contain is 10.

AJAX

This is not the place for a tutorial on AJAX (and besides, I’m not the person to deliver such a tutorial) so we’re going to make things easy on ourselves by using the prototype JavaScript framework found at http://prototype.conio.net. With this library you can be up and running quickly with AJAX.
You’ll find a link to the prototype library on the companion website or you can go directly to the URL referenced above. In any case, you’ll need the prototype.js file to run the code presented in this part of the chapter.

Installing SOAP

SOAP is not installed by default. This extension is only available if PHP has been configured with --enable-soap. (If you are running PHP under Windows, make sure you have a copy of the file php_soap.dll, add the line extension = php_soap.dll to your php.ini file, and restart your web server.)
If configuring PHP with support for SOAP is not within your control, you can implement something very similar to what we are doing here by using the NuSOAP classes that you’ll find at http://sourceforge.net/projects/nusoap. Even if you do have SOAP enabled, it is worth becoming familiar with NuSOAP not only to appreciate some well-crafted OO code, but also to realize just how much work this extension saves you. There are more than
5,000 lines of code in the nusoap.php file. It’s going to take us fewer than 50 lines of code to initiate our Google search. Furthermore, the SOAP client we create, since it’s using a built-in class, will run appreciably faster than one created using NuSOAP. (The NuSOAP classes are also useful if you need SOAP support under PHP 4.)

The SOAP Extension

You may think that the SOAP extension is best left to the large shops doing enterprise programming—well, think again. Although the “simple” in SOAP is not quite as simple as the “simple” in SimpleXML, the PHP implementation of SOAP is not difficult to use, at least where the SOAP client is concerned. Other objects associated with the SOAP protocol—the SOAP server in par- ticular—are more challenging. However, once you understand how to use a SOAP client, you won’t find implementing the server intimidating.
In cases where a WSDL file exists—and that is the case with the Google API—we don’t really need to know much about a SOAP client beyond how to construct one because the SOAP protocol is a way of executing remote proce- dure calls using a locally created object. For this reason, knowing the methods of the service we are using is paramount.

A SOAP Client

To make use of a web service, we need to create a SOAP client. The first step in creating a client for the Google API is reading the WSDL description of the service found at http://api.google.com/GoogleSearch.wsdl. SOAP allows us to create a client object using the information in this file. We will then invoke the doGoogleSearch method of this object. Let’s step through the code in our usual fashion beginning with the file dosearch.php. This is the file that actually does the search before handing the results over to an AJAX call.
The first step is to retrieve the search criterion variable.

<?php
$criterion = @htmlentities($_GET["criterion"], ENT_NOQUOTES);
if(strpos($criterion, "\"")){
$criterion = stripslashes($criterion);
echo "<b>$criterion</b>"."</p><hr style=\"border:1px dotted black\" />";
}else{
echo "\"<b>$criterion</b>\".</p><hr style=\"border:1px dotted black\" />";
}
echo "<b>$criterion</b></p><hr style=\"border:1px dotted black\" /><br />";

Wrapping the retrieved variable in a call to htmlentities is not strictly necessary since we’re passing it on to the Google API and it will doubtless be filtered there. However, filtering input is essential for security and a good habit to cultivate.

Make It Site-Specific

A Google search can be restricted to a specific website in exactly the same way that this is done when searching manually using a browser—you simply add site: followed by the domain you wish to search to the existing criterion. Our example code searches the No Starch Press site, but substitute your own values for the bolded text.

//put your site here
$query = $criterion . " site:www.yoursite.com";
//your Google key goes here
$key = "your_google_key";

In this particular case we are only interested in the top few results of our search. However, if you look closely at the code, you’ll quickly see how we
could use a page navigator and show all the results over a number of differ- ent web pages. We have a $start variable that can be used to adjust the offset at which to begin our search. Also, as you’ll soon see, we can determine the total number of results that our search returns.

$maxresults = 10;
$start = 0;

A SoapClient Object
Creating a SOAP client may throw an exception, so we enclose our code within a try block.

try{
$client = new SoapClient("http://api.google.com/GoogleSearch.wsdl");

When creating a SoapClient object, we pass in the WSDL URL. There is also an elective second argument to the constructor that configures the options of the SoapClient object. However, this argument is usually only necessary when no WSDL file is provided. Creating a SoapClient object returns a reference to GoogleSearchService. We can then call the doGoogleSearch method of this service. Our code contains a comment that details the parameters and the return type of this method.

/*
doGoogleSearchResponse doGoogleSearch (string key, string q, int start, int maxResults, boolean filter, string restrict, boolean safeSearch, string lr, string ie, string oe)
*/
$results = $client->doGoogleSearch($key, $query, $start, $maxresults, false, '', false, '', '', '');

This method is invoked, as is any method, by using an object instance and the arrow operator. The purpose of each argument to the doGoogleSearch method is readily apparent except for the final three. You can restrict the search to a specific language by passing in a language name as the third-to-last parameter. The final two parameters indicate input and output character set encoding. They can be ignored; use of these arguments has been deprecated.

The doGoogleSearch method returns a GoogleSearchResult made up of the following elements:

/*
GoogleSearchResults are made up of
documentFiltering, searchComments, estimatedTotalResultsCount, estimateIsExact, resultElements, searchQuery, startIndex, endIndex, searchTips, directoryCategories, searchTime
*/

Getting the Results

We are only interested in three of the properties of the GoogleSearchResult: the time our search took, how many results are returned, and the results themselves.

$searchtime = $results->searchTime;
$total = $results->estimatedTotalResultsCount;
if($total > 0){

The results are encapsulated in the resultElements property.

//retrieve the array of result elements
$re = $results->resultElements;

ResultElements have the following characteristics:

/*
ResultElements are made up of summary, URL, snippet, title, cachedSize, relatedInformationPresent, hostName, directoryCategory, directoryTitle
*/

We iterate through the ResultElements returned and display the URL as a hyperlink along with the snippet of text that surrounds the search results.

foreach ($re as $key => $value){
$strtemp = "<a href= \"$value->URL\"> ".
" $value->URL</a> $value->snippet<br /><br />\n";
echo $strtemp;
}
echo "<hr style=\"border:1px dotted black\" />";
echo "<br />Search time: $searchtime seconds.";
}else{
echo "<br /><br />Nothing found.";
}
}

Our call to the Google API is enclosed within a try block so there must be a corresponding catch. A SOAPFault is another object in the SOAP extension. It functions exactly like an exception.

catch (SOAPFault $exception){
echo $exception;
}
?>

Testing the Functionality

View the dosearch.php page in a browser, add the query string ?criterion=linux to the URL, and the SoapClient will return a result from Google’s API. You should get site-specific search results that look something like those shown in Figure 12-1.

There are hyperlinks to the pages where the search criterion was found, along with snippets of text surrounding this criterion. Within the snippet of text the criterion is bolded.
As already mentioned, this is not the solution for a high-traffic site where many searches will be initiated. Nor is it a solution for a newly posted site. Until a site is indexed by Google, no search results will be returned. Likewise, recent changes to a site will not be found until the Googlebot visits and registers them. However, these limitations are a small price to pay for such an easy way to implement a site-specific search capability.

Viewing the Results Using AJAX

Viewing the results in a browser confirms that the code we have written thus far is functional. We’re now ready to invoke this script from another page (search.html) using AJAX. The HTML code to do this is quite simple:

Search the No Starch Press site: <br />
< input type="text" id="criterion" style="width:150px" /><br />
< input style="margin-top:5px;width:60px;" type="button" value="Submit" onclick="javascript:call_server();" />
<h2>Search Results</h2>
< div id="searchresults" style="width:650px; display: block;"> Enter a criterion.
</div>

There’s a textbox for input and a submit button that, when clicked, invokes the JavaScript function, call_server. The results of our search will be displayed in the div with the id searchresults.
To see how this is done, let’s have a look at the JavaScript code:

<script type="text/javascript" language="javascript" src=
"scripts/prototype.js">
</script>
<script type="text/javascript" >
/*********************************************************************/
// Use prototype.js and copy result into div
/*********************************************************************/
function call_server(){
var obj = $('criterion');
if( not_blank(obj)){
$('searchresults').innerHTML = "Working...";
var url = 'dosearch.php';
var pars = 'criterion='+ obj.value;
new Ajax.Updater( 'searchresults', url,
{

});
}
}

method: 'get', parameters: pars,
onFailure: report_error

We must first include the prototype.js file because we want to use the Ajax.Updater object contained in that file. This file also gives us the capability of simplifying JavaScript syntax. The reference to criterion using the $() syntax is an easy substitute for the document.getElementById DOM function. The if statement invokes a JavaScript function to check that there is text in the criterion textbox. If so, the text in the searchresults div is over- written using the innerHTML property, indicating to the user that a search is
in progress. The URL that performs the search is identified ( ), as is
the search criterion. These variables are passed to the constructor of an

Ajax.Updater, as is the name of the function to be invoked upon failure. The Ajax.Updater class handles all the tricky code related to creating an XMLHttpRequest and also handles copying the results back into the searchresults div. All you have to do is point it to the right server-side script.
There are a number of other Ajax classes in the prototype.js file and the $() syntax is just one of a number of helpful utility functions. The com- panion website has a link to a tutorial on using prototype.js should you wish to investigate further.

Complex Tasks Made Easy

I’ve detailed just one of the services you can access using SOAP. Go to www.xmethods.net to get an idea of just how many services are available. Services range from the very useful—email address verifiers—to the relatively arcane—Icelandic TV station listings. You’ll be surprised at the number and variety of services that can be implemented just as easily as a Google search.
In this chapter you’ve seen how easy it is to create a SOAP client using PHP. We quickly got up and running with AJAX, thanks to the prototype.js frame- work, and you’ve seen that PHP and AJAX can work well together. Reading a news feed was simpler still. These are all tasks that rely heavily on XML, but minimal knowledge of this technology was required because PHP does a good job of hiding the messy details.

Would You Want to Do It Procedurally?

Knowledge of OOP is a requirement for anything beyond trivial use of the SimpleXML and SOAP extensions to PHP. OOP is not only a necessity in order to take full advantage of PHP, but it is by far the easiest way to read a feed or use SOAP. A procedural approach to either of the tasks presented in this chapter is not really feasible. Any attempt would unquestionably be much more difficult and require many, many more lines of code. Using built-in objects hides the complexity of implementing web services and makes their implementation much easier for the developer.

0 comments: