MultiLingual
 
Search Articles

Search for keyword:

Search for author:


Featured Article
Wednesday, October 22, 2014
 


Text Fragmentation and
Reuse in User Interfaces


Designers must consider linguistic differences when implementing messages that will be translated

RICHARD ISHIDA


In this article we look at some of the issues that are hardest to deal with in translation. Designers must be very careful about how they split up and reuse text on-screen, since the linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.

After reviewing the issues relating to text fragmentation and string reuse, we will look at what works and what doesn't.

Composite messages dynamically compose a single message from more than one text string. Typically, one or more parts of the composite message will also contain substrings that vary according to the context.


The

printer
stacker
stapler options

has been disabled.
A composite message that causes problems for translation

In the example The [printer/stacker/stapler options] has been disabled, the designer has implemented a single string for the common parts of three sentences — The ... has been enabled. — only once. Three alternative substrings have also been created, and the appropriate one is substituted at runtime according to the context.

This is generally a popular idea with designers because it appears to offer a way of reducing the work of the text author, improving message consistency and optimizing memory by reducing the identical parts of a number of messages to a single string.

Unfortunately, it can be either difficult or impossible to deal with such composite messages in translation because of the differences in the way other languages work — differences in sentence structure, agreement and so on. This is illustrated in the English example. If the alternative string stapler options were used at runtime, the word has would be incorrect. Since there is only one string containing the word has, it cannot be rendered in more than one way.

Types of Composite Messages

We will begin with a succinct summary of the aspects of composite messages that require our attention. Illustrative examples will follow. If this appears a little abstract at this stage, scan this section and then use it as a reference while looking at the examples that follow.

Subject:predicate vs. sentential messages. The text of a composite message is typically arranged in one of the following styles.

Subject:predicate. This arrangement states a topic and then states something about it, usually in a terse way. For example: Printer: enabled. Note that the colon is very commonly used to separate subject and predicate.

Sentential. This arrangement expresses an idea using a flowing, sentence-like syntax. For example: The [printer] has been enabled, where [printer] is inserted at runtime.

Selection-lists vs. variables. Where composite messages include alternating substrings, these substrings may be displayed as substring-lists or as variables.

Substring-list. In this scenario, all of the possible substrings are visible simultaneously. Examples of this arrangement include radio buttons and checklists. A substring-list cannot be embedded.

Variable. In this case, only the substring appropriate to the current context is displayed at any one time. If the context changes, the substring will be replaced by one of its alternatives or sometimes by nothing. An example of this is The [printer] has been enabled, where [printer] is inserted at runtime. A variable may be embedded or concatenated for display.

Concatenated vs. embedded messages. This distinction is purely an implementation decision. It is not always easy to tell whether a composite message is concatenated or embedded simply by looking at the user interface.

Embedded. The application inserts one or more substrings into a defined location within a parent string. Parent string and substrings are all displayed in the same text box/displayer. Embedded substrings are always variables.

Concatenated. A number of independent strings are placed near to or next to each other to convey the overall message. Strings may be concatenated within a single text box/displayer or across two or more different text boxes/displayers. These substrings may be displayed as variables or as substring-lists.

Substrings. There are several types of substring, each of which introduces slightly different requirements for translatability. In this article we will refer to the following types of substrings:

Translatable text. Pre-defined words or phrases in the message set that will need to be translated, such as the words printer, stacker or binder in the previous example.

Non-translatable text. A non-translatable and non-numeric string that is generated by the user at runtime or a non-translatable name, as in Error occurred while processing job: XXXXX, where XXXXX is the name of the job as supplied by the user. These are usually variables and are often names. Note that this definition excludes translatable text substrings as defined above.

Numeric. A numeric string that is generated at runtime by the product or is one of a set of fixed values such as Pages printed: XXX, where XXX is the number of pages the machine has counted so far.

Graphic. A graphic selected from a number of alternatives as part of a composite message, for example, a symbol of a paper tray embedded in text related to paper trays. This is not strictly a substring, but we will treat it as such since it may be used in a similar way.

Non-translatable. This is a group term referring to non-translatable text, numeric and graphic substrings.

Concatenated Subject:Predicate Messages

In the Printer settings box that we will use as an example of concatenated subject:predicate messages, each subject is invariant and is followed by one or more pre-defined predicate strings in separate display areas. The predicates are all variables, since only one alternative is displayed at a time, but this will be replaced by other substrings as the context changes. For example, On is replaced by Off or Disabled according to the current settings.


A concatenated subject:predicate message

In this example, the predicates are all translatable text substrings. Note that, due to the need in many languages to agree with the gender and number of the subject, the correct translation for words like On will vary, so re-using the same text variable in more than one place will lead to severe localization problems.

In this example, most messages are split across two text display areas. The message referring to the Binder, however, is split across three display areas, that is, the topic has two predicates. It is both Enabled and On. This is a perfectly valid approach and poses no issues for translation.

The next example shows the subject of the composite message, Image Quality, in the window header. The predicates appear on the main canvas of the dialog box. The fact that this is a composite message is readily understood by speakers of languages for whom the words Lighter, Darker and Normal will have to be translated in a specific way to agree with the number and gender of Image Quality. Again, all parts of the composite message are in separate text boxes/display areas, so this is an example of concatenation.


A second concatenated subject:predicate message

The predicates are, again, translatable text substrings, although in this case all the predicates are visible at the same time — that is, this is an example of a substring-list. In fact, two predicates, Lighter and Darker, appear twice!

In the next example we are dealing with a pull-down menu, but the concepts remain the same. The words Left, Centre and Right are translatable text substrings and in many languages must agree with the subject Alignment when translated. The substrings are also arranged here as a substring-list.


A concatenated subject:predicate message in a pull-down menu

Our final example of a concatenated subject:predicate message shows three views of the same display box over a period of time. It is slightly different from the previous examples since all the text appears within a single displayer. It is still concatenated, however, since the message is implemented as independent strings displayed alongside each other to show the status of the test. Three strings — one topic and two translatable predicates — can be displayed at a time. The predicates are translatable text substrings.


A final example of a concatenated subject:predicate message

Embedded Subject:Predicate Messages

Embedded composite messages are displayed in a single display area. Unlike concatenated messages, alternative substrings are never displayed simultaneously in embedded implementations, that is, they are always variables. Substrings are embedded into a parent string.

In the examples Line Monitor (On) and Volume - (Medium), the appropriate predicate substring is embedded into the parent string between the parentheses. All the predicate substrings are pre-defined translatable text substrings.


An embedded subject:predicate message

This is still a subject:predicate arrangement because we are making a clear distinction between what we are talking about — the subject — and what we are saying about it — the predicate.

In the next example, the predicates are a mixture of translatable and non-translatable text substrings and numeric variables. The values for each property are variables that are embedded into a parent string, such as File name: %s, and all are arranged in a subject:predicate style.

Default Template
Directory: C:\Workgroup\Scan
File Name: MyFile.tif
Image Quality: Text
Original Size: Autl
Resolution: 300 dpi

Embedded subject:predicate messages

Concatenated Sentential Messages

Sentential messages compose the parts of the message into a sentence-like order and flow, unlike concatenated messages that achieve this by juxtaposing separate text items.

In the example Retrieve last [2] of [4] total log entries, there are two numeric substrings (variables) and three translatable text substrings, all placed side by side to create the appearance of a flowing sentence.


This concatenated sentential message will be
problematic for translation into Japanese

There is a problem here for translation because the sentence has an English structure, and the order of all substrings and variables needs to be different for a language such as Japanese.

In addition, the words last and entries should have different endings in many languages, according to the numbers displayed.

Embedded Sentential Messages

These examples show substrings that are embedded in a parent string to create a sentence-like effect. In fact, it is difficult to tell by looking at the screen shots where the substring begins and ends. The substrings are all variables.

In the example The printer has been disabled, stacker or stapler options could replace the word printer. These are translatable text substrings. Unfortunately, the message as a whole cannot be translated sensibly because of linguistic issues relating to agreement.


An embedded sentential message

The example The file myfile.ext has been downloaded shows a non-translatable text variable, myfile.ext, embedded in a parent string. It is non-translatable because it is supplied by the user or system at runtime. It would be possible to translate this message into another language because the variable name does not affect the syntax of the sentence.


A variable name in embedded sentential message
that does not affect translation

The example 26 pages have been printed shows a numeric variable, 26, embedded in a parent string. The number is generated at run-time. In many languages it is not possible to translate this message in a quality way because the number affects the syntax of the sentence.


The number in this embedded sentential message affects translation

Linguistic Factors Affecting Composite Message Use

Concatenating sentential composite messages. Concatenated composite messages only work in a subject-predicate arrangement. These messages are hard to translate because in many cases the syntax of the source language cannot be replicated in other languages.

For example, the message Retrieve last [2] of [4] total log entries is very difficult to translate into Japanese because Japanese syntax is very different from English. The positioning of elements allows no possibility for variation in order or segment length.

The Japanese translation would put total log entries at the beginning of the sentence and last retrieve at the end. In addition, it would be necessary to reverse the order of the fields currently containing the terms 2 and 4. In fact, the geometry of the dialog box also has to be changed in order to accommodate the translation.


The dialog box is rebuilt to accommodate the
embedded sentential message in Japanese

Note that, in addition, it is very common that the substrings in such messages are supplied in scattered locations in the file and in random order. This means the translator has to become a detective and piece the original text back together before attempting a translation.

It is also often the case that a message such as that above would be supplied for translation as follows:

Text string 1: Retrieve last
Text string 2: of
Text string 3: total log entries.

In this case, even if we could reorder the Japanese text by putting the translations in different strings, we have no way of reordering the numeric variables, since these are not available to the translator.

Concatenated sentential messages become particularly difficult to manage when they contain translatable variables and numeric variables, since they also inherit the additional problems I will describe next.

Translatable text variables. Translatable text variables in a sentential arrangement can create insurmountable difficulties for the translator because of the linguistic properties of many languages. Two key linguistic properties in this regard are agreement and word/concept mappings.

The following provides an example of problems with agreement:

Message 1509: The [S120P1200] has been disabled.
Substring 1509.1: printer
Substring 1509.2: stacker
Substring 1509.3: stapler options

The problem here is visible even in English, since the word has is inappropriate for the substring stapler options. In French, the substrings above are, respectively, feminine singular, masculine singular and feminine plural, and would require three completely different translations of the parent string:

L'imprimante a été désactivée.
Le module de reception a été désactivé.
Les options d'agrafage ont été désactivées.

The word the may also be la in French if the next word is feminine and begins with a consonant, and the word disabled would need to be translated désactivés for a masculine plural noun. Such agreement is extremely common in languages other than English or Japanese and can often be more complicated than in French.

Since we only have one parent string to translate, it is impossible to obtain a sensible translation in French.

Such an implementation probably arose from the designer's or developer's attempts to improve the situation, but unfortunately a lack of knowledge about what would happen in translation has created a major problem for the foreign versions of the product.

Word and concept mappings can cause additional problems. Take, for example, the sequence:

Message 1509: Turn on the >S120P1200<
Substring 1509.1: printer
Substring 1509.2: stacker
Substring 1509.3: stapler options

In some languages, the appropriate translation for Turn on may vary according to what is being turned on. For example, Spanish may translate this idea with distinct terms such as Conectar, Encender or Activar. There would also be four possible translations for the word the. Since, however, there is only one instance of the initial string, it is again impossible to provide a quality translation.

Non-translatable text variables in sentential composite messages. It is important to be clear that the term non-translatable text variable as we are using it in this section does not include pre-defined, translatable substrings. The term is used to refer to text supplied at runtime — such as a file name, job name, person's name and so on — or to non-translatable names.

In the next examples, the translated sentence does not need to agree with the text variable since the subject of the sentence is already defined, that is, file or section.

The file [file_name] has beenscanned.
The section [section_title] gives further information.

In these cases the text of the variable is provided in apposition to the subject.

There is, however, an exception to this rule. If the text variable refers to a person rather than an object, many languages will still require changes to other parts of the sentence according to the gender of the person. For example:

The patient [person's_name] is ready.

The Spanish translations for this could include:

El enfermo Richard está listo.
La enferma Julia está lista.

In other words, embedded non-translatable text variables only work if they don't represent a proper noun.

Mobility of variables. It must be possible to move variables around relative to the surrounding text in whatever way the translator needs.

For example, the message There were %d spelling mistakes in file %s. is translated into German as Datei %s enthält %d Rechtschreibfehler.

Note how the order of the variables has been changed.

If you are familiar with the C programming language, you will realize that there was no way to achieve this required ordering without re-engineering the actual code — not something you want to do when localizing.

The developer will need to figure out how to allow the design and development environment to cope with the reordered elements coming back from translation.

Translators will also need to know what the variable represents and have a way of uniquely identifying each variable. This was not a problem in the previous examples, but would be a significant issue in the example Unable to %s %s in the %s routine.

The Japanese translation would read: %s routine in %s %s unable to.

In addition, the translator will need to know how big the variable can be to assess how much space is available for text expansion.

Numeric variables. Numeric variables should be used in subject-predicate arrangements wherever possible.

In many languages the word that is qualified by a number changes according to how many we are talking about. Take for example the message %d pages were printed.

In English, pages were should become page was if only one page was printed. Sometimes authors try to get around this by saying %d page(s) printed.

Unfortunately, things are not so simple in other languages. For example, Arabic has different verb and noun endings for one page, two pages and more than two pages, that is, they have two different types of plural.

Russian is even more complicated. The accompanying table shows the endings for the word page in Russian when associated with different numbers.

Number of Pages
Russian Word for Page
1
2-4
5-10
11-20 (irregular)
21
22-24
25-30
>30
Repeat pattern of
endings for 1 to 10.

As a result, it is extremely difficult to deal with such a message expressed in a sentential arrangement. It is therefore better to always express messages containing numbers like this as a subject:predicate arrangement. In a subject:predicate arrangement, the word pages remains invariant:

Context in subject:predicate messages. For many languages it is usually not possible to translate the predicate part of a subject:predicate message unless you know what the subject is. The word enabled in French is translated in one of four different ways, according to whether the subject is masculine, feminine, singular or plural:

Subject
Part of Speech
Translation of Enabled
Stacker
Printer
Bar codes
Stapler options
Masculine, singular
Feminine, singular
Masculine, plural
Feminine, plural
activé
activée
activés
activées

In other languages there are many other possibilities, since there are more than two genders and there may be case endings.

If the translator were presented with the word enabled on its own for translation, he or she would have no idea how it should be translated. Viewing the text on the product would not help in establishing the translation if the word enabled occurred with more than one subject.

Differences may also be semantic in nature. For example, if the word On was used here rather than Enabled, the appropriate translation in Spanish may be Encendida for the printer but Activadas for the stapler options. Conectado is another translation of the word On. Each of these three words also has four agreement forms. This gives 12 possible translations.

The translator must be provided with a means to associate a predicate with its subject in order to achieve a translation. This applies whether the predicates are coded as variables or as substring-lists.

Rules for Creating Translatable Composite Messages

The accompanying table summarizes which combinations of composite message and substring type will generally work or not work in translation. In summary, subject-predicate arrangements work well for translation, in both embedded and concatenated implementations, with any type of substring; concatenated sentential arrangements should always be avoided; and sentential embedded arrangements do not work with translatable text substrings or numeric substrings. They often work with non-translatable text substrings unless the substring represents a proper noun. They usually work with graphics.

 
Substring type
Translatable text
Numeric
Non-translatable text
Graphic
Subject:predicate Concatenated
Embedded
Sentential Concatenated
Embedded
()

The composite messages listed above are only truly translatable if certain additional requirements are met for their supply to the localization group. For example, for a composite message to be translatable, contextual information must be provided to show how the various parts of a composite message relate to each other; and composite message parts should be grouped together for delivery to the translator.

It must be possible to reorder variables in sentential arrangements and reposition them in any way relative to the text.

Information should be provided to the translator about variable type to aid comprehension, and each variable should preferably be labeled with a unique ID (think about the previous example with several %s variables).

Information should also be provided where necessary about variable size to allow for estimation of the impact of text expanding in translation.

Text String Reuse

Many designers decide that if a particular string is used in many places on the user interface, they will copy the same string to different locations rather than implement many identical strings. The perceived advantages to this are to save on memory, to promote consistency in the source and, sometimes, to save on translation cost.

This is very commonly seen in association with composite messages. We discuss it separately here purely to reduce the complexity in explaining it.

In the first example, the designer has decided to copy the single English string On to several different locations on the user interface, rather than create three separate instances of the string.


A possible (poor) implementation of multi-context string reuse

Unfortunately, the advantages for the source text developer become major problems for the translator. Multi-context string reuse typically creates some of the greatest problems for achieving a quality translation.

Take the example of On copied to several different locations in the user interface. In Spanish, whereas the printer would normally be encendida, that is, on-line, the stacker would be encendido and the stapler options would be activadas, that is, enabled. If there is only a single available instance of the string expressing the English idea On, these variations cannot be expressed. It is therefore impossible to obtain a correct translation, and the quality of the Spanish user interface is seriously impaired.

These differences arise out of the way agreement and concept mappings change in translation and cannot be avoided. The Spanish word for On can be translated in at least 12 ways: activado, conectado, encendido, activada, conectada, encendida, activados, conectados, encendidos, activadas, conectadas and encendidas.

Note: optimizing memory usage by reducing multiple instances of the same string to one can still be done. But it should be done after translation, not before. This would mean that the word On is optimized to one string in English, but three or more strings in Spanish.

Examples of Good and Bad String Reuse

String reuse is not necessarily a bad thing. It sometimes makes very good sense to ask the translator to translate a string once rather than 50 times. The trick is to know what constitutes a good candidate for reuse and what does not.

The key is whether or not the string is used in different contexts.

In many languages an adjective like none usually agrees with its subject. If the word none is used to refer to shading, line and background color, it is used in three different contexts and so there should be three different strings. If, however, the adjective none were only ever used to refer to shading, it would have only one context and would therefore be a good candidate for reuse.

Bad candidates for reuse. The following examples are taken from a real product. The product team submitted a list of strings they were thinking of reusing to the localization group, who provided the following feedback.

Any word that will change its shape through agreement is a bad candidate — for example, all adjectives. Many ordinary words are also subject to differences in translation according to the context because of different concept mappings between languages.

Here are some examples of bad candidates for reuse from the product review.


Adjectives Ordinary words
Advanced
Casual
Collated
Dedicated
High
Metric
Mixed
None
Same
Standard
Uncollated
Back
Down
Front
Insert before job is in progress.
Low
Medium
Off
On
Output
Reset
Time
Up

Here are the issues with a few of these unsuitable terms.

Time in German can be expressed as Uhr for current time, that is, nine o'clock, but as Dauer for duration, that is, the file was downloading for nine hours.

Reset appears to be a technical word, but is a special case. It has two translations in Dutch. The general translation for this term is Herstel. However, when talking about a System Reset, the appropriate translation is Herstarten or Herstart. This illustrates the need to have the localization group review the strings you propose to reuse, since it would be difficult for an engineer who doesn't speak the language to spot this.

The word Low will have different endings in French and German according to whether it refers to something singular or plural and something masculine, feminine or (in German) neuter. This is the case for most adjectives. Don't forget that none is also an adjective!

Insert before job is in progress has no object, that is, it doesn't state what you are inserting. In languages like Japanese and German the “what” may lead to different translations of the word insert.

Good candidates for reuse. Most technical words are reasonably safe candidates, but see the previous example of Reset. Any self-contained phrase such as a complete sentence or a heading is likely to be safe. Any word — even an adjective — is a good candidate as long as it is always used in exactly one context.

Here are some examples of good candidates for reuse from the same product review.


Word/phrase Explanation
Cancel Used 48 times, but always in this product to cancel a dialog box. (If it had been used to cancel an operation, an additional string would be needed.)
Open the front door. Used 16 times, but a self-contained sentence.
Misfed original Used 14 times, but although original is an adjective, it always refers to the original document in this product.
Paper supply Used 4 times, but sufficiently technical and specific to have a single translation.

Rules for Text String Reuse

Reused strings must not refer to more than one text, graphic or conceptual context. Strings should be reused where appropriate —that is, where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase.

Reused strings should be displayed in the same size displayer, or the translator should be warned to fit the text in the smallest size of bounding box used. globe



Richard Ishida is a global design consultant at GKLS (Global Knowledge & Language Services), a division of Xerox. He can be reached at richard.ishida@gbr.xerox.com


This article reprinted from #43 Volume 12 Issue 7 of
MultiLingual Computing & Technology published by MultiLingual Computing, Inc., 319 North First Ave., Sandpoint, Idaho, USA, 208-263-8178, Fax: 208-263-6310.

October/November, 2001