1 Wikimedia Internationalization Library
2 ======================================
4 This library provides interfaces and value objects for internationalization (i18n)
5 of applications in PHP.
7 It is based on the i18n code used in MediaWiki, and is also intended to be
8 compatible with [jQuery.i18n], a JavaScript i18n library.
13 Any text string that is needed in an application is a **message**. This might
14 be something like a button label, a sentence, or a longer text. Each message is
15 assigned a **message key**, which is used as the identifier in code.
17 Each message is translated into various languages, each represented by a
18 **language code**. The message's text (as translated into each language) can
19 contain **placeholders**, which represents a place in the message where a
20 **parameter** is to be inserted, and **formatting commands**. It might be plain
21 text other than these placeholders and formatting commands, or it might be in a
22 **markup language** such as wikitext or Markdown.
24 A **formatter** is used to convert the message key and parameters into a text
25 representation in a particular language and **output format**.
27 The library itself imposes few restrictions on all of these concepts; this
28 document contains recommendations to help various implementations operate in
35 use Wikimedia\Message\MessageValue;
36 use Wikimedia\Message\MessageParam;
37 use Wikimedia\Message\ParamType;
39 // Constructor interface
40 $message = new MessageValue( 'message-key', [
42 new MessageValue( 'another-message' ),
43 new MessageParam( ParamType::NUM, 12345 ),
47 $message = ( new MessageValue( 'message-key' ) )
48 ->params( 'parameter', new MessageValue( 'another-message' ) )
52 $messageFormatter = $serviceContainter->get( 'MessageFormatterFactory' )->getTextFormatter( 'de' );
53 $output = $messageFormatter->format( $message );
61 Messages and their parameters are represented by newable value objects.
63 **MessageValue** represents an instance of a message, holding the key and any
64 parameters. It is mutable in that parameters can be added to the object after
67 **MessageParam** is an abstract value class representing a parameter to a message.
68 It has a type (using constants defined in the **ParamType** class) and a value. It
69 has two implementations:
71 - **ScalarParam** represents a single-valued parameter, such as a text string, a
72 number, or another message.
73 - **ListParam** represents a list of values, which will be joined together with
74 appropriate separators. It has a "list type" (using constants defined in the
75 **ListType** class) defining the desired separators.
79 A formatter for a particular language is obtained from an implementation of
80 **IMessageFormatterFactory**. No implementation of this interface is provided by
81 this library. If an environment needs its formatters to vary behavior on things
82 other than the language code, for example selecting among multiple sources of
83 messages or markup language used for processing message texts, it should define
84 a MessageFormatterFactoryFactory of some sort to provide appropriate
85 IMessageFormatterFactory implementations.
87 There is no one base interface for all formatters; the intent is that type
88 hinting will ensure that the formatter being used will produce output in the
89 expected output format. The defined output formats are:
91 - **ITextFormatter** produces plain text output.
93 No implementation of these interfaces are provided by this library.
95 Formatter implementations are expected to perform the following procedure to
96 generate the output string:
98 1. Fetch the message's translation in the formatter's language. Details of this
99 fetching are unspecified here.
100 - If no translation is found in the formatter's language, it should attempt
101 to fall back to appropriate other languages. Details of the fallback are
103 - If no translation can be found in any fallback language, a string should
104 be returned that indicates at minimum the message key that was unable to
106 2. Replace placeholders with parameter values.
107 - Note that placeholders must not be replaced recursively. That is, if a
108 parameter's value contains text that looks like a placeholder, it must not
109 be replaced as if it really were a placeholder.
110 - Certain types of parameters are not substituted directly at this stage.
111 Instead their placeholders must be replaced with an opaque representation
112 that will not be misinterpreted during later stages.
113 - Parameters of type RAW or PLAINTEXT
114 - TEXT parameters with a MessageValue as the value
115 - LIST parameters with any late-substituted value as one of their values.
116 3. Process any formatting commands.
117 4. Process the source markup language to produce a string in the desired output
118 format. This may be a no-op, and may be combined with the previous step if
119 the markup language implements compatible formatting commands.
120 5. Replace any opaque representations from step 2 with the actual values of
121 the corresponding parameters.
123 Guidelines for Interoperability
124 -------------------------------
126 Besides allowing for libraries to safely supply their own translations for
127 every app using them, and apps to easily use libraries' translations instead of
128 having to retranslate everything, following these guidelines will also help
129 open source projects use [translatewiki.net] for crowdsourced volunteer
130 translation into many languages.
134 [BCP 47] language tags should be used for language codes. If a supplied
135 language tag is not recognized, at minimum the corresponding tag with all
136 optional subtags stripped should be tried as a fallback.
138 All messages must have a translation in English (code "en"). All languages
139 should fall back to English as a last resort.
141 The English translations should use `{{PLURAL:...}}` and `{{GENDER:...}}` even
142 when English doesn't make a grammatical distinction, to signal to translators
143 that plural/gender support is available.
145 Language code "qqq" is reserved for documenting messages. Documentation should
146 describe the context in which the message is used and the values of all
147 parameters used with the message. Generally this is written in English.
148 Attempting to obtain a message formatter for "qqq" should return one for "en"
151 Language code "qqx" is reserved for debugging. Rather than retrieving
152 translations from some underlying storage, every key should act as if it were
153 translated as something `(key-name: $1, $2, $3)` with the number of
154 placeholders depending on how many parameters are included in the
159 Message keys intended for use with external implementations should follow
160 certain guidelines for interoperability:
162 - Keys should be restricted to the regular expression `/^[a-z][a-z0-9-]*$/`.
163 That is, it should consist of lowercase ASCII letters, numbers, and hyphen
164 only, and should begin with a letter.
165 - Keys should be prefixed to help avoid collisions. For example, a library
166 named "ApplePicker" should prefix its message keys with "applepicker-".
167 - Common values needing translation, such as names of months and weekdays,
168 should not be prefixed by each library. Libraries needing these should use
169 keys from the [Common Locale Data Repository][CLDR] and document this
170 requirement, and environments should provide these messages.
174 Placeholders are represented by `$1`, `$2`, `$3`, and so on. Text like `$100`
175 is interpreted as a placeholder for parameter 100 if 100 or more parameters
176 were supplied, as a placeholder for parameter 10 followed by text "0" if
177 between ten and 99 parameters were supplied, and as a placeholder for parameter
178 1 followed by text "00" if between one and nine parameters were supplied.
180 All formatting commands look like `{{NAME:$value1|$value2|$value3|...}}`. Braces
181 are to be balanced, e.g. `{{NAME:foo|{{bar|baz}}}}` has $value1 as "foo" and
182 $value2 as "{{bar|baz}}". The name is always case-insensitive.
184 Anything syntactically resembling a placeholder or formatting command that does
185 not correspond to an actual paramter or known command should be left unchanged
186 for processing by the markup language processor.
188 Libraries providing messages for use by externally-defined formatters should
189 generally assume no markup language will be applied, and should avoid
190 constructs used by common markup languages unless they also make sense when
193 ### Formatting commands
195 The following formatting commands should be supported.
199 `{{PLURAL:$count|$formA|$formB|...}}` is used to produce plurals.
201 $count is a number, which may have been formatted with ParamType::NUM.
203 The number of forms and which count corresponds to which form depend on the
204 language, for example English uses `{{PLURAL:$1|one|other}}` while Arabic uses
205 `{{PLURAL:$1|zero|one|two|few|many|other}}`. Details are defined in
206 [CLDR][CLDR plurals].
208 It is not possible to "skip" positions while still suppling later ones. If too
209 few values are supplied, the final form is repeated for subsequent positions.
211 If there is an explicit plural form to be given for a specific number, it may
212 be specified with syntax like `{{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}`.
216 `{{GENDER:$name|$masculine|$feminine|$unspecified}}` is used to handle
217 grammatical gender, typically when messages refer to user accounts.
219 This supports three grammatical genders: "male", "female", and a third option
220 for cases where the gender is unspecified, unknown, or neither male nor female.
221 It does not attempt to handle animate-inanimate or [T-V] distinctions.
223 $name is a user account name or other similar identifier. If the name given
224 does not correspond to any known user account, it should probably use the
227 If $feminine and/or $unspecified is not specified, the value of $masculine
228 is normally used in its place.
232 `{{GRAMMAR:$form|$term}}` converts a term to an appropriate grammatical form.
234 If no mapping for $term to $form exists, $term should be returned unchanged.
236 See [jQuery.i18n § Grammar][jQuery.i18n grammar] for details.
240 `{{BIDI:$text}}` applies directional isolation to the wrapped text, to attempt
241 to avoid errors where directionally-neutral characters are wrongly displayed
242 when between LTR and RTL content.
244 This should output U+202A (left-to-right embedding) or U+202B (right-to-left
245 embedding) before the text, depending on the directionality of the first
246 strongly-directional character in $text, and U+202C (pop directional
247 formatting) after, or do something equivalent for the target output format.
249 ### Supplying translations
251 Code intending its messages to be used by externally-defined formatters should
252 supply the translations as described by
253 [jQuery.i18n § Message File Format][jQuery.i18n file format].
255 In brief, the base directory of the library should contain a directory named
256 "i18n". This directory should contain JSON files named by code such as
257 "en.json", "de.json", "qqq.json", each with contents like:
268 "last-updated": "2012-09-21"
270 "appname-title": "Example Application",
271 "appname-sub-title": "An example application",
272 "appname-header-introduction": "Introduction",
273 "appname-about": "About this application",
274 "appname-footer": "Footer text"
278 Formatter implementations should be able to consume message data supplied in
279 this format, either directly via registration of i18n directories to check or
280 by providing tooling to incorporate it during a build step.
284 [jQuery.i18n]: https://github.com/wikimedia/jquery.i18n
285 [BCP 47]: https://tools.ietf.org/rfc/bcp/bcp47.txt
286 [CLDR]: http://cldr.unicode.org/
287 [CLDR plurals]: https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
288 [jQuery.i18n grammar]: https://github.com/wikimedia/jquery.i18n#grammar
289 [jQuery.i18n file format]: https://github.com/wikimedia/jquery.i18n#message-file-format
290 [translatewiki.net]: https://translatewiki.net/wiki/Translating:New_project
291 [T-V]: https://en.wikipedia.org/wiki/T%E2%80%93V_distinction