Tuesday, July 14, 2009

Beginning PHP and Oracle From Novice to Professional by W. Jason Gilmore and Bob Bryla Chapter 24

The Web makes it incredibly easy for you to communicate your message to anybody with an Internet connection and a Web browser, no matter if they’re sitting in a café in Moscow’s Red Square, on a
farm in Ohio, in a cubicle in a Shanghai high-rise, or in an Israeli classroom.
There is one tiny issue: only about 29 percent of the total Internet population actually speaks English.1 The rest speak Chinese, Japanese, Spanish, German, French, and several dozen other languages. Therefore if you’re interested in truly reaching a global audience, you’ll need to think about creating a Web site conforming to not only the visitor’s native language but their standards for
currency, dates, numbers, times, and so on.
But creating software capable of being used by the global community is hard and, not only for the obvious reason, you have to have the resources available to translate the Web site text. You also have to think about integrating the language and standards modifications into the existing applica-
tion in a manner that precludes insanity. This chapter will help you eliminate this challenge.

■Note One of PHP 6’s key features is native support for Unicode (http://www.unicode.org/), a standard
that greatly reduces the overhead involved in creating applications and Web sites intended to be used on multiple platforms and to support multiple languages. While neither Unicode nor PHP’s implementation are discussed in this book, be sure to learn more about the topic if globally accessible applications are a crucial part of your project.

Approaches to Internationalizing and
Localizing Applications

Supporting native languages and standards is a two-step process, requiring the developer to inter- nationalize and localize the Web site. Internationalizing the Web site involves making the changes necessary to localize the Web site, which involves updating the site to offer the actual languages and features. In this section you’ll learn about an approach you might consider for internationalizing and localizing your Web site.

■Note Because programmers are lazy, you’ll often see internationalization written as i18n, and localization as l10n.

1. Internet World Stats: http://www.internetworldstats.com/

441

Translating Web Sites with Gettext

Gettext (http://www.gnu.org/software/gettext/) is one of the many great projects created and main- tained by the Free Software Foundation, consisting of a number of utilities useful for internationalizing and localizing software. Over the years it’s become a de facto standard solution for maintaining translations for countless applications and Web sites. PHP interacts with Gettext through a name- sake extension, meaning you’ll need to download the Gettext utility and install it on your system. If you’re running Windows, download it from http://gnuwin32.sourceforge.net/ and make sure you update the PATH environment variable to point to the installation directory.
Because PHP’s Gettext extension isn’t enabled by default, you’ll probably need to reconfigure PHP. If you’re on Linux you can enable it by rebuilding PHP with the --enable-gettext option. On Windows all you need to do is uncomment the php_gettext.dll line found in the php.ini file. See Chapter 2 for more information about configuring PHP.
The remainder of this section guides you through the steps necessary to create a multilingual
Web site using PHP and Gettext.

Step 1: Update the Web Site Scripts

Gettext must be able to recognize which strings you’d like to translate. This is done by passing all translatable output through the gettext() function. Each time gettext() is encountered, PHP will look to the language-specific localization repository (more about this in step 2) and match the string encompassed within the function to the corresponding translation. The script knows which transla- tion to retrieve due to earlier calls to setlocale(), which tells PHP and Gettext which language and country you want to conform to, and then bindtextdomain() and textdomain(), which tell PHP where to look for the translation files.
Pay special note to the mention of both language and country because you shouldn’t simply pass a language name (e.g., Italian) to setlocale(). Rather, you need to choose from a predefined combination of language and country codes as defined by the International Standards Organization. For example, you might want to localize to English but use the United States number and time/date format. In this case you would pass en_US to setlocale() as opposed to en_GB. Because the differences between British and United States English are minimal, largely confined to a few spelling variants, you’d only be required to maintain the few differing strings and allow gettext() to default to the
strings passed to the function for those it cannot find in the repository.

■Note You can find both the language and country codes as defined by ISO on many Web sites, just search for
the keywords ISO, country codes, and language codes. Table 24-1 offers a list of common code combinations.

Table 24-1. Common Country and Language Code Combinations

Combination Locale pt_BR Brazil fr_FR France
de_DE Germany en_GB Great Britain he_IL Israel
it_IT Italy

Table 24-1. Common Country and Language Code Combinations

Combination Locale

es_MX Mexico

es_ES Spain

en_US United States

Listing 24-1 presents a simple example that seeks to translate the string Choose a password: to its Italian equivalent.

Listing 24-1. Using gettext() to Support Multiple Languages

<?php

// Specify the target language
$language = 'it_IT';

// Assign the appropriate locale setlocale(LC_ALL, $language);

// Identify the location of the translation files bindtextdomain("messages", "/usr/local/apache/htdocs/locale");

// Tell the script which domain to search within when translating text textdomain("messages");
?>

<form action="subscribe.php" method="post">
<?php echo gettext("Enter your e-mail address:"); ?><br />
<input type="text" id="email" name="email" size="20" maxlength="40" value="" />
<input type="submit" id="submit" value="Submit" />
</form>

Of course, in order for Listing 24-1 to behave as expected, you need to create the aforemen- tioned translation repository and translate the strings according to the desired language. You’ll learn how to do this in Steps 2, 3, and 4.

Step 2: Create the Localization Repository

Next you need to create the repository where the translated files will be stored. One directory should be created for each language/country code combination, and within that directory you need to create another named LC_MESSAGES. So for example, if you plan on localizing the Web site to support English (the default), German, Italian, and Spanish, the directory structure would look like this:
locale/
de_DE/ LC_MESSAGES/
it_IT/
LC_MESSAGES/
es_ES/ LC_MESSAGES/

You can place this directory anywhere you please because the bindtextdomain() function (shown in action in Listing 24-1) is responsible for mapping the path to a predefined domain name.

Step 3: Create the Translation Files

Next you need to extract the translatable strings from the PHP scripts. This is done with the xgettext command, which is a utility bundled with Gettext. xgettext offers an impressive number of options, each of which you can learn more about by executing xgettext with the --help option. Executing the following command will cause xgettext to examine all of the files found in the current directory ending in .php, producing a file consisting of the desired strings to translate:

%>xgettext -n *.php

The -n option will result in the file name and line number to be included before each string entry in the output file. By default the output file is named messages.po, although you can change this using the --default-domain=FILENAME option. A sample output file follows:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy msgid "" msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2007-05-16 13:13-0400\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n"

#: homepage.php:12
msgid "Subscribe to the newsletter:" msgstr ""

#: homepage.php:15
msgid "Enter your e-mail address:" msgstr ""

#: contact.php:12
msgid "Contact us at info@example.com!" msgstr ""

Copy this file to the appropriate localization directory and proceed to the next step.

Step 4: Translate the Text

Open the messages.po file residing in the language directory you’d like to translate, and translate the strings by completing the empty msgstr entries that correspond to an extracted string. Then replace the placeholders represented in all capital letters with information pertinent to your application.

Pay particular attention to the CHARSET placeholder because the value you use will have a direct effect on Gettext’s ability to ultimately translate the application. You’ll need to replace CHARSET with the name of the appropriate character set used to represent the translated strings. For example, character set ISO-8859-1 is used to represent languages using the Latin alphabet, including English, German, Italian, and Spanish.Windows-1251 is used to represent languages using the Cyrillic alphabet, including Russian. Rather than exhaustively introduce the countless character sets here, I suggest you check
out Wikipedia’s great summary at http://en.wikipedia.org/wiki/Character_encoding.

■Tip Writing quality text in one’s own native tongue is difficult enough, so if you’d like to translate your Web site
into another language, seek out the services of a skilled speaker. While professional translation services can be quite expensive, consider contacting your local university because there’s typically an abundance of foreign-language students who would welcome the opportunity to gain some experience in exchange for an attractive rate.

Step 5: Generate Binary Files

The final required preparatory step involves generating binary versions of the messages.po files, which will be used by Gettext. This is done with the msgfmt command. Navigate to the appropriate language directory and execute the following command:

%>msgfmt messages.po

Executing this command will produce a file named messages.mo, which is what Gettext will ultimately use for the translations.
Like xgettext, msgfmt also offers a number of features through options. Execute msgfmt --help
to learn more about what’s available.

Step 6: Set the Desired Language Within Your Scripts

To begin taking advantage of your localized strings, all you need to do is set the locale using setlocale() and call the bindtextdomain() and textdomain() functions as demonstrated in Listing 24-1. The end result is the ability to use the same code source to present your Web site in multiple languages. For instance, Figures 24-1 and 24-2 depict the same form, the first with the locale set to en_US and the second with the locale set to it_IT.

Figure 24-1. A newsletter subscription form with English prompts

Figure 24-2. The same subscription form, this time in Italian

Of course there’s more to maintaining translations than what is demonstrated here. For instance, you’ll need to know how to merge and update .po files as the Web site’s content changes over time. Gettext offers a variety of utilities for doing exactly this; consult the Gettext documentation for more details.

While Gettext is great for maintaining applications in multiple languages, it still doesn’t satisfy the need to localize other data such as numbers and dates. This is the subject of the next section.

■Tip If your Web site offers material in a number of languages, perhaps the most efficient way to allow a user
to set a language is to store the locale string in a session variable, and then pass that variable into setlocale()
when each page is loaded. See Chapter 18 for more information about PHP’s session-handling capabilities.

Localizing Dates, Numbers, and Times

The setlocale() function introduced in the previous section can go far beyond facilitating the local- ization of language; it can also affect how PHP renders dates, numbers, and times. This is important because of the variety of ways in which this often crucial data is represented among different countries. For example, suppose you are a United States–based organization providing an essential subscrip- tion-based service to a variety of international corporations. When it is time to renew subscriptions, a special message is displayed at the top of the browser that looks like this:

Your subscription ends on 3-4-2008. Renew soon to avoid service cancellation.

For the United States–based users, this date means March 4, 2008. However, for European users, this date is interpreted as April 3, 2008. The result could be that the European users won’t feel compelled to renew the service until the end of March, and therefore will be quite surprised when they attempt to log in on March 5. This is just one of the many issues that might arise due to confusion over data representation.
You can eliminate such inconsistencies by localizing the information so it appears exactly as the user comes to expect it. PHP makes this a fairly easy task, done by setting the locale using setlocale() and then using functions such as money_format(), number_format(), and strftime() to output the data.
For example, suppose you want to render the renewal deadline date according to the user’s locale. Just set the locale using setlocale() and run the date through strftime(), (also taking advan- tage of strtotime() to create the appropriate timestamp) like this:

<?php
setlocale(LC_ALL, 'it_IT');
printf("Your subscription ends on %s", strftime('%x', strtotime('2008-03-04'));
?>

This produces the following:

Your subscription ends on 04/03/2008

The same process applies to formatting number and monetary values. For instance, while the United States uses a comma as the thousands separator, Europe uses a period, a space, or nothing at all for the same purpose. Making matters more confusing, while the United States uses a period for the decimal separator, Europe uses a comma for this purpose. Therefore the following numbers are ultimately considered identical:

• 523,332.98

• 523 332.98

• 523332.98

• 523.332,98

Of course it makes sense to render such information in a manner most familiar to the user, in order to reduce any possibility of confusion. To do so, you can use setlocale() in conjunction with number_format() and another function named localeconv(), which returns numerical formatting information about a defined locale. Used together, these functions can produce properly formatted numbers, like so:

<?php
setlocale(LC_ALL, 'it_IT');
$locale = localeconv();
printf("(it_IT) Total hours spent commuting %s <br />", number_format(4532.23, 2, $locale['decimal_point'],
$locale['thousands_sep']));

setlocale(LC_ALL, 'en_US');
$locale = localeconv();
printf("(en_US) Total hours spent commuting %s", number_format(4532.23, 2, $locale['decimal_point'],
$locale['thousands_sep']));
?>

This produces the following result:

(it_IT) Total hours spent commuting 4532,23 (en_US) Total hours spent commuting 4,532.23

Summary

Maintaining a global perspective when creating your Web sites can only serve to open up your prod- ucts and services to a much larger audience. Hopefully this chapter showed you that the process is much less of a challenge than you previously thought.
The next chapter introduces you to one of today’s hottest approaches in Web development paradigms: frameworks. You’ll put what you learn about this topic into practice by creating a Web site using the Zend Framework.

0 comments: