User:Liangent/wb-lang


 * Target: Language fallback & conversion feature for data stored in Wikibase.
 * Mentor: User:Denny?

Although Wikidata is in its fast development stage, lots of data have been added to it. The later we resolve these issues, the more duplications may be created which will require more clean up work in the future, like what we had to face before / when the language converter (that transliteration system) was introduced for the Chinese Wikipedia. So I'm planning to do this project in this summer.

Introduction
Currently Wikidata stores multilingual contents. Labels (names, descriptions etc) are expected to be written in every language, so every user can read them in their own language. But there're some problems currently:


 * If some content doesn't exist in some specific language, users with this exact language set in their preferences see something meaningless (its ID instead). This renders some language with fewer users (thus fewer labels filled) even unusable.
 * There're some similar languages which may often share the same value. Having strings populated for every language one by one wastes resources and may allow them out of sync later.
 * Even for languages which are not "that similar", MediaWiki already has some facility to transliterate (aka. convert) contents from its another sister language (aka. variant) which can be used to provide better results for users.

This proposal aims at resolving these issues by displaying contents from another language to users based on user preferences (some users may know more than one languages), language similarity (language fallback chain), or the possibility to do transliteration, and allow proper editing on these contents.

More

 * Every user may define it's language preference order
 * Every language has its system fallback order
 * Some languages can be derivated from other (prime & sister) languages automatically
 * Display what user loves best to the extent of what's available in current data

Reasons to do this

 * Someting readable is better than nothing (especially in small languages, whose users usually know some other languages too)
 * Language with heavy subtag / variant use (such as zh: zh-cn, zh-tw etc. which can be derivated from others automatically with LanguageConverter)
 * Wikibase / Wikidata is under development. Having this (fallback + conversion) included in design is better that patching it in adhoc ways later.
 * WMF cares more and more about multilingual content these days, and is lack of people knowing about LanguageConverter :)

Technical notes

 * Caching issues need some care
 * Wikibase is under fast development. Talk with others to minimize merge conflicts

Timeline

 * May 27, 1900 UTC: Announced
 * May 28 - June 6
 * (I'll be busy at the first one or two weeks after June 17 so I may have to start early to compensate that)
 * Investigate places where visible (to users & other external developers such as bot authors) work is needed which may include API (new interfaces may be needed), repo front-end (obviously), client front-end (for example, the add-link dialog) and exported data (for example, data dumps, if we're planning to provide per-language dump at some time), and design the interface when needed.


 * June 7 - June 16
 * Investigate current data exchange structures (API, embedded JavaScript data or anything else etc) and see whether they still meet my need. Design new data structure when necessary.


 * June 17: Beginning
 * June 17 - June 30 (and as soon as any design is done)
 * Send designs of interface and data structure to mentor & others for review, and during this period I may be somehow busy, so it won't block me much if there's some delay in others' actions.


 * July 1 - July 20
 * Code up anything internal (data structures, API etc.) based on design done in previous time.


 * July 21 - July 29
 * Front-end development based on design, part I.


 * July 29, 1900 UTC - August 2, 1900 UTC: Mid-term
 * Writing some summary about current design and coding work as mid-term evaluation document.


 * August 3 - August 11
 * Front-end development, part II.


 * August 12 - August 25
 * Test it and see whether it works, tweak code when necessary


 * Auguest 26 - September 2
 * Test it on larger data set? (optional, continue coding work if it's not done)


 * September 3 - September 16
 * Try to have it deployed on Wikidata and test it in real world? (optional, continue coding work if it's not done)


 * September 16 / September 23, 0900 UTC - September 27, 1900 UTC: Final documentations and reports


 * Expected target: Have it deployed on Wikidata
 * Minimal target: Have a working codebase done

Links

 * Wikidata/Data model
 * Wikidata/Notes/Data model primer
 * Wikidata/Notes/Language fallback
 * Writing systems
 * Language in MediaWiki