Universal Language Selector

'''This document is a work in progress. Comments are appreciated but this is not a final draft.'''

This document describes the design of a language selector widget. This document is intended to be a starting point for conversations about how to best handle language selection on Wikimedia sites.

IMPORTANT: This feature does not differentiate between selection of language at specific scales or contexts. That is, a user may choose to have Spanish as their "chrome" language, and then use the selector to view an article written in German, or maybe change their keyboard mapping to Russian, etc.

Rationale

 * Wikimedia properties exist in hundreds of languages. Simple selectors do not scale beyond ten or twenty items. Wikipedia is offered in 284 languages, and there are 424 available translations for MediaWiki.

Identified language selection contexts include but are not limited to:
 * Site-wide language selection (e.g., user preferences) for the site "chrome"
 * Editing language selection, especially for input modification (e.g., Narayam)
 * Interwiki link selection
 * Portal page language selection (such as on the Wikipedia home page).

Hypothesis

 * Making language selection easier and more intuitive will allow for greater penetration into all languages
 * Making language selection easier will enable the sites to be more culturally neutral
 * Making language selection easier will promote cross-wiki collaboration

Feature Requirements

 * The language selector should be built as a "widget" that can be called from multiple contexts (e.g., can be repurposed)
 * The language selector should attempt to aid the user as much as possible
 * The language selector should be language agnostic

User Scenarios


To better illustrate the intended purpose of the Universal Language Selector, several scenarios have been defined. The complete definition is. Here a summary has been included in this section.

To create representative scenarios, the behavioral variables that affect interaction have been analyzed and structured on four user profiles.

Behavioral variables
A set of factors that affect the way in which a user select a language have been defined. These variables have been identified by analyzing different sources of information (bug reports, talk pages for existing documentation) and gathering feedback from members of the Wikipedia community and members of the Localisation team.

The variables identified are the following:


 * Number of languages spoken by the user (one, two, several)
 * Skill level for each language (no, bad, basic, good/native)
 * Kind of language spoken by the user:
 * Writing direction for the language (LTR, RTL)
 * Number of speakers (many, few, none/historical)
 * Geographic distribution (shared with other languages, spoken in multiple geographic areas, spoken in a single area)
 * User main activity (consuming content, searching, contributing content)
 * User location (native country, non-native country where the user can communicate, foreign country)
 * Familiarity with language features (not familiar, basic use, advanced, expert)
 * Device used for access (tablet, desktop/laptop)
 * User account (anonymous, registered)

This is not intended to be a complete scientific taxonomy. The purpose is to facilitate the evaluation of design solutions.

User Profiles and scenarios
Each combination of values of the former behavioral variables defines a specific kind of user for which the ideal interaction may be different. A small subset of user archetypes has been defined to cover the behavioral variables to the widest possible degree. A summary of the profiles defined and the scenarios is profiles below, the attached PDF includes a more detailed description.

George
George is an architect from Georgia that speaks only Georgian and consumes content from his tablet. He is not a registered user and he is not aware of language tools such as the inter-wiki links.

His goal is to learn about new topics in his own language:
 * Properly displayed content. George access the Georgian version of an article and everything is properly displayed.
 * Features: non-intrusive UI, proper font display for non-latin scripts.
 * Recover from a foreign language. George access the English Wikipedia through a friend's link, and figures out how to access the Georgian version.
 * Features: intuitive location for language selection.

Nambi
Nambi is a nurse from Paraguay that speaks Guaraní and Spanish. She contributes content in Guaraní and consumes content in both languages. She is a registered user and makes basic use of the existing language tools.

Her goal is to contribute to the Guaraní community:
 * Contribute local content. Nambi realizes that there is not a Guaraní version of an article She is reading in the Spanish Wikipedia. She creates a new article based on a translation of the Spanish one. She keeps jumping from one version to the other during the translation process.
 * Features: Support recurrent language change among a small set, access to local-spoken languages, and indication of lack of language support.
 * Setting the UI. Nambi prefers the user interface menus to be in Guaraní. She becomes aware that registered users can customize their UI language and changes this.
 * Features: Discoverability of UI language settings, and distinction between UI and Content language settings.

Rakha
Rakha is a musician from India, currently in Belgium. He speaks Tamil (native) and English (very basic). He searches and consumes content from the public PC at the hotel. He is not a registered user and he is not familiar with language tools.

His goal is to feel like at home (despite being abroad):
 * Search in his language. Rakha accesses the Wikipedia in Tamil but the public PC at the hotel has a French keyboard. Rakha finds guidance on how to introduce a few Tamil characters for the search. When he finds little information on the subject, he switches to the English version for a moment.
 * Features: Discoverability of input method configuration, aids for accessing input method configuration during text input, terminology used to communicate input method configuration, and discoverability of relevant content in other languages.
 * Cross-language information. Rakha has accessed the Tamil version of Wikipedia, but he is looking for information on a Belgium dish he only saw written in French. He searches for the dish name in French, but he is interested to access the information in Tamil.
 * Features: Disable/re-enable input methods, and cross-language access of search results.

Lev
Lev is a professor from Israel. He speaks Russian, Hebrew, and studies Old Aramaic. He is a registered user with deep knowledge of language tools and contributes content in these three languages from a PC.

His goal is to share knowledge about his topics of interest:
 * Extensive contributions. Lev makes large contributions in Cyrillic to the Russian Wikipedia. He uses different keyboards at work (lacking Cyrillic mapping) and at home (a Latin-Hebrew-Russian keyboard).
 * Features: Input method configuration UI support for long texts, integration of system configured input methods.
 * Mixed content. Lev studies Old Aramaic and includes content in Aramaic for Hebrew and Russian articles.
 * Features: Indication of current language for each piece of content.
 * Translation across languages. Lev helps to translate content between Russian and Hebrew and registers as a translator for both languages.
 * Features: Multiple language selection.

Design goals
The following goals have been defined:


 * Discoverability. The selector and their features are easy to find.
 * Fluency. The solution should interfer minimally with the main flow of tasks for the user. Changing settings related to language should be fast and not distracting.
 * Learnability. The way in which the selector is operated and the effects of each action should be easy to anticipate.

An initial plan for validation of the design solutions has been defined, and it is summarized below.

How to measure that the goals are achieved
Prototypes will be build to test the different design solutions. For the 2-3 most promising design concepts, two rounds of testing will be carried out with 3-5 users per prototype. Each validation stage will consist in the following steps:
 * 1) Classify the participants. A short survey based on the behavioral variables identified will be completed by each participant. This will determine how the user fits in the types of users defined.
 * 2) Perform the usability tests. Users will use prototypes to carry out some test scenarios.
 * 3) Analyze the results. Design solutions will be modified/discarded according to the feedback obtained.

What to observe
Criteria to evaluate different design solutions and determine whether they are improving the current solution have been defined. This includes gathering quantitative data (we can measure them) and qualitative data (we can infer them by watching the use of the product and confirm them by asking). The observations of interest are the following for each design goal:


 * Discoverability. Language selector is easy to find.
 * Measure the success ratio: how many users found the way to change the language.
 * Measure the time to find the selector: time spent since the user starts looking for the language selector until they click on it
 * Ask: Do you think this information is provided in other languages?
 * Fluency. Language selector is quick to operate and non-distracting.
 * Measure the time to select languages.
 * Ask: Do you think changing between languages is fast?
 * Ask: Does the language selector distracted you from your main goal of looking for information?
 * Learnability. Language selector is easy to learn.
 * Measure the number of aids used when changing language.
 * Ask: Which elements do you see in the selector and which do you think its purpose is?
 * Ask: Which effect do you think it will have to select a language from this list?

Limitations for the testing
Several issues specific to the project limit the testing process:
 * Prototype-user language alignment. The prototype language must be aligned with the language of the user testing it. This require to either limit the scope of the participants to recruit, or elaborate the prototypes in such a way that texts can be easily replaced by appropriate translations (increasing their complexity of the prototype elaboration).
 * Configuration for text input. Advanced features related to input such as as transliteration are difficult to integrate in prototypes. Tests will be defined in such a way that these features could be simulated as much as possible.

User Experience
In most cases, the language selector must be "activated". Clicking on the activation link will (normally) cause the selector system to be loaded into a jQuery window. In cases where there is no "activator" (such as a portal page), the selector is displayed "in page".

Activation
The mechanism for activating the selector is something that requires much thought and care. Users who do not speak English, for example, may not understand that the glyphs "Select Language" mean exactly that. It must be assumed that the user is not familiar with the language used by the site chrome.

Iconography is thus required. Unfortunately, there is (at the time of this writing) no universal icon for "select language". Some sites utilize "flag" iconography but this not acceptable because:


 * Flags are not culturally neutral. There are more English speakers in India than there are in the United States, for instance.
 * Language is not "owned" by any one political organization. Is "English" best served by the flag of the United Kingdom, Canada, United States of America, Australia, or any one of other nations that recognize English as an official language?
 * Users typically have very low "flag literacy". Not all flags are known to all peoples.
 * Flags can be political hot-buttons. Not all nations are recognized by all nations.
 * Flags have potential "expiration dates". Nations do not exist in perpetuity; they rise and fall. They may also change their flag symbology and colors.
 * Flag systems require maintenance for accuracy.

At this point in time, the best developed icon for this type of function is a stylized, wireframe globe. The globe should not display any land mass areas for two main reasons: cultural neutrality (which country is on the front of the globe) and for resolution-at-scale issues (reading the icon at 16x16 pixels).

Process
Once the selector is shown, the user may go about selecting the language of choice.

Note that all languages are to be displayed in their native spelling and with native fonts. Additionally, ISO codes should be displayed next to them.

The selector should begin with two panes, one on top of each other.

Top Pane
The first pane shall include a flattened map of the world divided into basic "regions" (not nation borders). The system should auto-detect the region that the user is located in (either via geoip lookup or GPS coordinate lookup in the case of mobile phones) and have that region auto-selected (and displayed in the lower pane).

Additionally, there will be an ordered list of the world's most common languages displayed (for quick selection).

The map is hot - selecting any region will reload the bottom pane with that region's languages.

Bottom Pane
This pane includes a language search function as well as two disparate lists of languages, in sectioned columnar format.

The first section shall show between 5 and 10 of the most common languages in the selected region, ordered by number of speakers.

The second section will again list out languages used in the region. However, this list will include as many as can be included. This list is ordered (alphabetically? by iso code?). Simple pagination controls (next/previous) will allow the user to browse the full list if it does not display within the confines of the dialog.

Non-Javascript Browsers
TBD

Arun Ganesh's Design
At the 2011 Mumbai Hackathon, a small team began discussions addressing a major problem facing Wikimedia sites: language selection. The team put their heads together and came up with several ideas. The basic workflow was captured by Arun Ganesh; he put together a pdf file.

Specification Callouts
''This text is copied and pasted directly from the PDF file. Formatting has been wikified slightly.''

The widget has 4 parts: 1. Widget shortcut:
 * a. Permanent link displaying selected language alongwith icons to indicate it is language related.
 * b. Seperate icon to directly open keyboard input or on-screen keyboard widget

2. Primary selector:
 * a. Static world map with marker showing user’s location guessed from geolocation user timezone or system language options. Clicking map changes the data in the secondary selector (3)
 * b. Static list of primary gloabal languages or biggest wikipedia projects. This enables quickly selecting a major language which has a high probability of usage.

3. Secondary selector:
 * a. Text box allowing a user to search language or countries. Autocomplete list
 * b. Country flag icons based on the region selected in the primary selector map or guessed user location. User can select his country flag if he identifies it
 * c. Regional map of the user’s area of interest. No markers needed, but clicking can change the ordering of languages
 * d. One column listing primary languages of the region or country chosen
 * e. Two columns with a comprehensive listing of all the minor languages. 3 text sizes to indicate the number of speakers of the language in the region.

4. Keyboard Input:
 * a. Options for keyboard mapping. Selecting a language would automatically load the available options and switch to the recommended input method for that language. Disable option switches to system settings
 * b. Keyboard mapping reference and on-screen keyboard. Dockable widget which can be dragged and detached from the language widget
 * c. Dropdown font selector to switch webfonts
 * d. Link to typing help or reference material

The universal language selector will allow a user friendly method for a user to switch to his native language from a foreign language interface with the help of visual aids.

The language selector widget can be activated by a link placed as close to the top-left or top-right of the page as possible. This is usually the first place a person looks for customization options in a foreign interface. Once activated, the widget expands to display the language selection options.

Related bug reports

 * 33677 Enable setting language preference without requiring login.
 * 28900 Add a keyboard layout display option to Narayam.
 * 33460 WebFonts "Select font" portlet menu should only be shown if there are options to choose from
 * 30595 Inconsistent search results with language links.
 * 28970 There's no clean way to indicate the directionality of the content of the page.
 * 17107 Fallback languages for CentralNotice.
 * 29788 Swedish-language wikis should use Swedish-locale sorting (ie. ÅÄÖ should sort correctly).