Merge "Add .pipeline/ with dev image variant"
[lhc/web/wiklou.git] / includes / libs / Message / README.md
1 Wikimedia Internationalization Library
2 ======================================
3
4 This library provides interfaces and value objects for internationalization (i18n)
5 of applications in PHP.
6
7 It is based on the i18n code used in MediaWiki, and is also intended to be
8 compatible with [jQuery.i18n], a JavaScript i18n library.
9
10 Concepts
11 --------
12
13 Any text string that is needed in an application is a **message**. This might
14 be something like a button label, a sentence, or a longer text. Each message is
15 assigned a **message key**, which is used as the identifier in code.
16
17 Each message is translated into various languages, each represented by a
18 **language code**. The message's text (as translated into each language) can
19 contain **placeholders**, which represents a place in the message where a
20 **parameter** is to be inserted, and **formatting commands**. It might be plain
21 text other than these placeholders and formatting commands, or it might be in a
22 **markup language** such as wikitext or Markdown.
23
24 A **formatter** is used to convert the message key and parameters into a text
25 representation in a particular language and **output format**.
26
27 The library itself imposes few restrictions on all of these concepts; this
28 document contains recommendations to help various implementations operate in
29 compatible ways.
30
31 Usage
32 -----
33
34 <pre lang="php">
35 use Wikimedia\Message\MessageValue;
36 use Wikimedia\Message\MessageParam;
37 use Wikimedia\Message\ParamType;
38
39 // Constructor interface
40 $message = new MessageValue( 'message-key', [
41 'parameter',
42 new MessageValue( 'another-message' ),
43 new MessageParam( ParamType::NUM, 12345 ),
44 ] );
45
46 // Fluent interface
47 $message = ( new MessageValue( 'message-key' ) )
48 ->params( 'parameter', new MessageValue( 'another-message' ) )
49 ->numParams( 12345 );
50
51 // Formatting
52 $messageFormatter = $serviceContainter->get( 'MessageFormatterFactory' )->getTextFormatter( 'de' );
53 $output = $messageFormatter->format( $message );
54 </pre>
55
56 Class Overview
57 --------------
58
59 ### Messages
60
61 Messages and their parameters are represented by newable value objects.
62
63 **MessageValue** represents an instance of a message, holding the key and any
64 parameters. It is mutable in that parameters can be added to the object after
65 creation.
66
67 **MessageParam** is an abstract value class representing a parameter to a message.
68 It has a type (using constants defined in the **ParamType** class) and a value. It
69 has two implementations:
70
71 - **ScalarParam** represents a single-valued parameter, such as a text string, a
72 number, or another message.
73 - **ListParam** represents a list of values, which will be joined together with
74 appropriate separators. It has a "list type" (using constants defined in the
75 **ListType** class) defining the desired separators.
76
77 ### Formatters
78
79 A formatter for a particular language is obtained from an implementation of
80 **IMessageFormatterFactory**. No implementation of this interface is provided by
81 this library. If an environment needs its formatters to vary behavior on things
82 other than the language code, for example selecting among multiple sources of
83 messages or markup language used for processing message texts, it should define
84 a MessageFormatterFactoryFactory of some sort to provide appropriate
85 IMessageFormatterFactory implementations.
86
87 There is no one base interface for all formatters; the intent is that type
88 hinting will ensure that the formatter being used will produce output in the
89 expected output format. The defined output formats are:
90
91 - **ITextFormatter** produces plain text output.
92
93 No implementation of these interfaces are provided by this library.
94
95 Formatter implementations are expected to perform the following procedure to
96 generate the output string:
97
98 1. Fetch the message's translation in the formatter's language. Details of this
99 fetching are unspecified here.
100 - If no translation is found in the formatter's language, it should attempt
101 to fall back to appropriate other languages. Details of the fallback are
102 unspecified here.
103 - If no translation can be found in any fallback language, a string should
104 be returned that indicates at minimum the message key that was unable to
105 be found.
106 2. Replace placeholders with parameter values.
107 - Note that placeholders must not be replaced recursively. That is, if a
108 parameter's value contains text that looks like a placeholder, it must not
109 be replaced as if it really were a placeholder.
110 - Certain types of parameters are not substituted directly at this stage.
111 Instead their placeholders must be replaced with an opaque representation
112 that will not be misinterpreted during later stages.
113 - Parameters of type RAW or PLAINTEXT
114 - TEXT parameters with a MessageValue as the value
115 - LIST parameters with any late-substituted value as one of their values.
116 3. Process any formatting commands.
117 4. Process the source markup language to produce a string in the desired output
118 format. This may be a no-op, and may be combined with the previous step if
119 the markup language implements compatible formatting commands.
120 5. Replace any opaque representations from step 2 with the actual values of
121 the corresponding parameters.
122
123 Guidelines for Interoperability
124 -------------------------------
125
126 Besides allowing for libraries to safely supply their own translations for
127 every app using them, and apps to easily use libraries' translations instead of
128 having to retranslate everything, following these guidelines will also help
129 open source projects use [translatewiki.net] for crowdsourced volunteer
130 translation into many languages.
131
132 ### Language codes
133
134 [BCP 47] language tags should be used for language codes. If a supplied
135 language tag is not recognized, at minimum the corresponding tag with all
136 optional subtags stripped should be tried as a fallback.
137
138 All messages must have a translation in English (code "en"). All languages
139 should fall back to English as a last resort.
140
141 The English translations should use `{{PLURAL:...}}` and `{{GENDER:...}}` even
142 when English doesn't make a grammatical distinction, to signal to translators
143 that plural/gender support is available.
144
145 Language code "qqq" is reserved for documenting messages. Documentation should
146 describe the context in which the message is used and the values of all
147 parameters used with the message. Generally this is written in English.
148 Attempting to obtain a message formatter for "qqq" should return one for "en"
149 instead.
150
151 Language code "qqx" is reserved for debugging. Rather than retrieving
152 translations from some underlying storage, every key should act as if it were
153 translated as something `(key-name: $1, $2, $3)` with the number of
154 placeholders depending on how many parameters are included in the
155 MessageValue.
156
157 ### Message keys
158
159 Message keys intended for use with external implementations should follow
160 certain guidelines for interoperability:
161
162 - Keys should be restricted to the regular expression `/^[a-z][a-z0-9-]*$/`.
163 That is, it should consist of lowercase ASCII letters, numbers, and hyphen
164 only, and should begin with a letter.
165 - Keys should be prefixed to help avoid collisions. For example, a library
166 named "ApplePicker" should prefix its message keys with "applepicker-".
167 - Common values needing translation, such as names of months and weekdays,
168 should not be prefixed by each library. Libraries needing these should use
169 keys from the [Common Locale Data Repository][CLDR] and document this
170 requirement, and environments should provide these messages.
171
172 ### Message format
173
174 Placeholders are represented by `$1`, `$2`, `$3`, and so on. Text like `$100`
175 is interpreted as a placeholder for parameter 100 if 100 or more parameters
176 were supplied, as a placeholder for parameter 10 followed by text "0" if
177 between ten and 99 parameters were supplied, and as a placeholder for parameter
178 1 followed by text "00" if between one and nine parameters were supplied.
179
180 All formatting commands look like `{{NAME:$value1|$value2|$value3|...}}`. Braces
181 are to be balanced, e.g. `{{NAME:foo|{{bar|baz}}}}` has $value1 as "foo" and
182 $value2 as "{{bar|baz}}". The name is always case-insensitive.
183
184 Anything syntactically resembling a placeholder or formatting command that does
185 not correspond to an actual paramter or known command should be left unchanged
186 for processing by the markup language processor.
187
188 Libraries providing messages for use by externally-defined formatters should
189 generally assume no markup language will be applied, and should avoid
190 constructs used by common markup languages unless they also make sense when
191 read as plain text.
192
193 ### Formatting commands
194
195 The following formatting commands should be supported.
196
197 #### PLURAL
198
199 `{{PLURAL:$count|$formA|$formB|...}}` is used to produce plurals.
200
201 $count is a number, which may have been formatted with ParamType::NUM.
202
203 The number of forms and which count corresponds to which form depend on the
204 language, for example English uses `{{PLURAL:$1|one|other}}` while Arabic uses
205 `{{PLURAL:$1|zero|one|two|few|many|other}}`. Details are defined in
206 [CLDR][CLDR plurals].
207
208 It is not possible to "skip" positions while still suppling later ones. If too
209 few values are supplied, the final form is repeated for subsequent positions.
210
211 If there is an explicit plural form to be given for a specific number, it may
212 be specified with syntax like `{{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}`.
213
214 #### GENDER
215
216 `{{GENDER:$name|$masculine|$feminine|$unspecified}}` is used to handle
217 grammatical gender, typically when messages refer to user accounts.
218
219 This supports three grammatical genders: "male", "female", and a third option
220 for cases where the gender is unspecified, unknown, or neither male nor female.
221 It does not attempt to handle animate-inanimate or [T-V] distinctions.
222
223 $name is a user account name or other similar identifier. If the name given
224 does not correspond to any known user account, it should probably use the
225 $unspecified gender.
226
227 If $feminine and/or $unspecified is not specified, the value of $masculine
228 is normally used in its place.
229
230 #### GRAMMAR
231
232 `{{GRAMMAR:$form|$term}}` converts a term to an appropriate grammatical form.
233
234 If no mapping for $term to $form exists, $term should be returned unchanged.
235
236 See [jQuery.i18n § Grammar][jQuery.i18n grammar] for details.
237
238 #### BIDI
239
240 `{{BIDI:$text}}` applies directional isolation to the wrapped text, to attempt
241 to avoid errors where directionally-neutral characters are wrongly displayed
242 when between LTR and RTL content.
243
244 This should output U+202A (left-to-right embedding) or U+202B (right-to-left
245 embedding) before the text, depending on the directionality of the first
246 strongly-directional character in $text, and U+202C (pop directional
247 formatting) after, or do something equivalent for the target output format.
248
249 ### Supplying translations
250
251 Code intending its messages to be used by externally-defined formatters should
252 supply the translations as described by
253 [jQuery.i18n § Message File Format][jQuery.i18n file format].
254
255 In brief, the base directory of the library should contain a directory named
256 "i18n". This directory should contain JSON files named by code such as
257 "en.json", "de.json", "qqq.json", each with contents like:
258
259 ```json
260 {
261 "@metadata": {
262 "authors": [
263 "Alice",
264 "Bob",
265 "Carol",
266 "David"
267 ],
268 "last-updated": "2012-09-21"
269 },
270 "appname-title": "Example Application",
271 "appname-sub-title": "An example application",
272 "appname-header-introduction": "Introduction",
273 "appname-about": "About this application",
274 "appname-footer": "Footer text"
275 }
276 ```
277
278 Formatter implementations should be able to consume message data supplied in
279 this format, either directly via registration of i18n directories to check or
280 by providing tooling to incorporate it during a build step.
281
282
283 ---
284 [jQuery.i18n]: https://github.com/wikimedia/jquery.i18n
285 [BCP 47]: https://tools.ietf.org/rfc/bcp/bcp47.txt
286 [CLDR]: http://cldr.unicode.org/
287 [CLDR plurals]: https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
288 [jQuery.i18n grammar]: https://github.com/wikimedia/jquery.i18n#grammar
289 [jQuery.i18n file format]: https://github.com/wikimedia/jquery.i18n#message-file-format
290 [translatewiki.net]: https://translatewiki.net/wiki/Translating:New_project
291 [T-V]: https://en.wikipedia.org/wiki/T%E2%80%93V_distinction