MultiLingual Computing, Inc., Magazine
menu 1
menu 2
menu 3
menu 4
menu 5
menu 6
menu 7
menu 8
About Us
Magazine
News
Guides
Calendar
Careers
Resources
Downloads
MultiLingual Computing Home Page

MultiLingual Article

Search Articles


Search for keyword:

Search for author:


 
 
Featured Article
Thursday, September 2, 2010


Locale Definitions: The Ongoing Debate

Users discover limitations in conventions
designed to facilitate localization


SUZANNE TOPPING


In the world of multilingual computing, locales are a necessary evil. But what exactly is a locale? Let's start with a few formal definitions. The Sun Global Application Developer Corner site defines locale as "a specific geographical, political, or cultural region. It is usually identified by a combination of language and country, for example, en_US represents the locale US English." The Microsoft Global Software Development site defines locale as "a set of rules and data specific to a given language and geographic area. These rules and data include information on character classification; date and time formatting; numeric, currency, weight and measure conventions; and sorting rules." The IBM International Components for Unicode (ICU) glossary defines locale as "a set of conventions affected or determined by human language and customs, as defined within a particular geopolitical region. These conventions include (but are not necessarily limited to) the written language, formats for dates, numbers and currency, sorting orders, etc."

Essentially, locales are sets of rules which are usually based on a geographic location and which are used to localize software settings for users in regions around the world.

Debates on several e-mail discussion lists in recent months prompted the creation of a new list, devoted entirely to the subject of locales. But why does this particular topic warrant a list all its own? This article will tell you exactly that.

Why It's an Issue

As illustrated in the Sun definition, locale specifications currently take the form of a language/country combination. Therein lies the problem. A set connection between the two is inherently problematic given the diversity of the world's use of language. Graham Rhind, founder of GRC Database Information, summarizes the problem by saying, "The name of a country is not the same as a language, which is not the same as a currency, which is not the same as an address format, which is not the same as a name format, and so on."

David Possin, Globalization Engineer for Heartlab, Inc., concurs: "We need to break the tie between language and locale, and create real-world minimal defaults for the locale definition. While IBM, Sun, and Microsoft use predefined language_country locales for their operating systems, that does not mean they are right."

The bottom line is that existing approaches for locale handling are too limited and inflexible. Once a locale is specified, down comes a hard and fast set of characteristics which may or may not match actual user needs. Michael Kaplan, president of Trigeminal Software and author of the book Internationalization with Visual Basic, says, "The challenge with managing locales is that (1) every user has a specific idea of what he or she wants. (2) Individual users do not all agree about what they want, and so they end up with what they don't want. (3) Unhappy users do not understand the user interface well enough to fix the problem. Even if they know the user interface well, it may not be flexible enough to allow the degree of customization they desire."

Let's take a look at a real-world example of how this works. A global e-commerce site such as Amazon.com is localized in many languages, can handle different currencies and shipping methods, and can provide offers customized according to the user's locale setting. For example, a customer with an en_US locale would see text in English, view best-selling titles for the United States and see prices in US dollars. But many people in the United States may prefer to use a language other than English. These users might want to view the site in their native language, see the best-seller list for their native country, and still want pricing in dollars and shipping information appropriate to a US address.

Let's say that this situation applied to a Mexican-American. A locale of es_US is invalid because Spanish is not an official language of the United States, and it also uses the wrong Spanish variant (Spain's, not Mexico's). A locale of mx_US is still invalid because mx is not a language. A locale of es_MX would provide the correct Spanish variant, but the wrong region.

This is just one example of the potential complexities regarding locale-related options. But are these really locale issues, or do they cross over into some sort of user preference no-man's land?

Who Cares?

An interesting variety of groups and individuals are deeply interested in dealing with locale-related issues. Library of Congress software developers must deal with a staggering range of language and script issues. SIL International, producer of the Ethnologue (a comprehensive collection of information about the languages of the world), has a huge need for locale-related support, particularly due to its work with minority languages. Government agencies in multilingual regions must produce information which can be read by their various constituents. Commercial software developers create products that are used around the world. Companies from many industries want to sell products in other countries over the Web and need to offer information to potential customers in the most usable way. Each of these groups has a slightly different view on what is important, which makes the challenge of creating a single workable locale model even bigger.

What Should Be Included?

Now that we've established that better methods for dealing with locales are needed, let's take a closer look at what a locale is and isn't. Or perhaps more accurately, what it should and shouldn't be.

Mark Davis, Chief Globalization Architect for IBM and President of the Unicode Consortium, comments, "What people want is some structured way to indicate and/or communicate a raft of information about a user's preferences. That would presumably include the traditional features of a locale, such as how dates are formatted, but may also include currency, time zone, preferred character set, smoker/non-smoker, vegetarian or not, music preference, religion, party affiliation, favorite charity, window or aisle seat and so on."

Given the nearly infinite degree to which preference categorization could occur, just how far should a locale model go? Which of the myriad variables for localization belong in a locale and which belong in some other structure? Should some variables be associated with the locale, others with the data, and still others with the application?

Language, country, and region identification. While language identification is a critical piece in the locale management puzzle, a common concern is that language codes need to be improved to reflect dialects and local rules. A number of languages are not yet included in the ISO 639 language group standard. While new codes will undoubtedly be added and while four-character codes are also being evaluated, the standard still offers a fairly restricted language specification approach.

To complicate matters, many people believe that the concept of regions would be extremely useful in many situations, perhaps more so even than country codes. Countries within a particular area of the world often have elements in common with relatively small variations. Examples of regions are North America, Central America, South America, Western Europe, Central Asia, Western Asia, South Asia, the Pacific, North Africa and so on. But regional classifications can get tricky because definitions can be geographic, economic, political and so forth, with each classification potentially introducing variations in currency, time zone or other variables.

Clearly, it's a complicated subject.

User preference attributes. As Davis' comment points out, how far should a "locale" preference definition go? What attributes should be included?

Some probable inclusions in a locale-related model are:
  Language
  Dialect
  Script
  Personal name format
  Address format
  Date/time format
  Calendar
  Time zone
  Data casing (upper/lower case)
  Telephone number format
  Currency format
  Measurement format
  Numeric format
  Collation/sorting

Graham Rhind comments in relation to locale attributes: "I cannot think of a single good reason why any of these unique characteristics should be inextricably linked to any other in any way or standard whatsoever, nor why they should be made unchangeable."

Given this view, a number of issues must be addressed. For example, should every attribute be modifiable? How many attributes are needed for a minimal default locale description? Should every attribute have at least one value and allow a null value? Is a fallback mechanism required for handling missing or invalid attributes? Should a minimal set of default attributes be required?

In addition to answering these questions, relationship subcategories may need to be defined. For example, there may not be any relationship between time zone and language, but dialect is definitely a subcategory of language and may not merit a separate category.

When one is generating a locale framework, the issues to be addressed are really quite complex.

Suggested Approaches

As the on-line locale discussion progresses, a few approaches have been offered as starting points. Carl W. Brown, president of X.Net, Inc., discusses single- and multiple-parameter approaches. He described a single-parameter locale specification which supplies more information than the existing methods — for example, es-mx_US.iso-8859-1#America/Los_Angeles

Since many applications would require a lot of change to track multiple parameters rather than a single locale specification, some people feel that this is the best approach.

Brown also describes using separate parameters which could be treated independently to avoid problems like the ones resulting from the current language_country coupling. An example is lang_es-mx#loc_US#tz_America/Los_Angeles#char_iso-8859-1 or lang[es-mx]#loc[US]#tz[America/Los_Angeles]#char[iso-8859-1]

Possin suggests that locale parameters be full-text-based rather than limiting specification to codes of a few characters as is currently in place. His view is that translations of many full-text parameters already exist in operating systems, programming languages, databases and so on, and they simply need to be standardized. For example,

*lang#German#Bavarian# would be equivalent to *lang#Deutsch#Bayrisch#

*region#Germany#Bavaria# would be equivalent to *region#Deutschland#Bayern#

*tz#Middle_European# would be equivalent to *tz#mitteleuropäisch#

Another suggestion runs along an entirely different line. A fear had been expressed that even if a standard were created, there would be no "industry-strength implementation" to provide a delivery mechanism for it. To deal with this situation, one person commented that he hoped to eventually see a Web "preference server" which would define a "preferences content model." This approach, however, relies on an implementer to develop a solution, rather than having a cross-industry standardized method for locale handling.

Conclusion

Building a better locale mousetrap is a complex problem. Existing platforms, operating systems, programming languages and markup languages all handle locales in slightly different ways. Microsoft methods are different from Linux methods, and HTML handling is different from XML handling. Because of this, when a sound framework is developed, widespread implementation will still be a bit down the road.

As Barry Caplan, founder of www.i18n.com, states, "one of the strengths of Unicode over the years has been the early recognition that the scheme decided upon would need to be extensible without breaking. I think at least as much thought needs to be given up front to the locale issue if we expect people to adopt it as widely as Unicode."

The issue of locales was discussed at the World Wide Web Consortium (W3C) Internationalization Workshop held in February 2002. Many participants felt that locale is an area that would benefit from W3C attention; however, as of the writing of this article, it is not yet known whether the consortium will choose to take it on. If this eventually does happen, it would go a long way in helping to ensure widespread adoption of an agreed-upon locale framework.

In the meantime, we'll all just keep trying to figure out what to call it, what to include in it, and how to deal with all the other pesky problems of delivering information electronically to diverse populations around the world. globe.gif


References

  IBM ICU Glossary: http://oss.software.ibm.com/icu/userguide/glossary.html
  Microsoft Global Software Development: www.microsoft.com/globaldev/
  Sun Global Application Developer Corner: www.sun.com/developers/gadc/
  Locales discussion list: http://groups.yahoo.com/group/locales/




Suzanne Topping is vice president of BizWonk, Inc. She can be reached at stopping@bizwonk.com


This article reprinted from #47 Volume 13 Issue 3 of
MultiLingual Computing & Technology published by MultiLingual Computing, Inc., 319 North First Ave., Sandpoint, Idaho, USA, 208-263-8178, Fax: 208-263-6310.

April/May, 2002


 
     

 


webmaster@multilingual.com ©1998-2010, Copyright MultiLingual Computing, Inc. No duplication or reproduction without expressed written permission.