Language tools/Impact Measurement and Metrics

Introduction
The primary objective for developing language tools is to help lower the barriers to access and participation on the Wikimedia websites as well as the Open Web. Language tools provide mechanisms to make these goals easier to achieve (for example support search using a different keyboard layout or a morphological engine). This makes it difficult to isolate the impact produced by language-related tools from the impact of the service used.

Measuring how language tools affect general goals
One way to measure the impact of language tools is to measure general goals across communities with different levels of language tool support. These general goals include:


 * Growth in the number of readers
 * "In the past I only read the English Wikipedia and now I also read the Assamese Wikipedia."
 * "In the past I didn't read Wikipedia, because I know only Amharic and there was no Amharic font on computers, and now I can read Wikipedia in Amharic."


 * Conversion ratio of readers becoming editors
 * "In the past I only wrote in the English Wikipedia and now I also contribute to the Assamese Wikipedia."
 * "In the past I didn't write in Wikipedia, because I know only Amharic and there was no Amharic keyboard on computers, and now I can contribute to Wikipedia in Amharic."


 * Coverage
 * Is it possible in MediaWiki to document, in text, all human knowledge that is available?
 * “In the past I couldn't quote a chapter from the Rig Veda in Wikipedia, because it had a character that wasn't supported in any font, or :: I didn't have any way to type it, or the MediaWiki Unicode normalization corrupted it, and now I can do it.”


 * Time spent on a Wikimedia website
 * Number of contributions
 * Quality of contributions
 * Growth in number of page views as search tools improve
 * “I wrote a great article about Utrecht in Telugu, but it didn't appear in search results, because Lucene was buggy, so it had 0 readers according to grok.se. Now Lucene is fixed and 50 people read it every day.”

Comparing how these goals are achieved before and after a feature or language tool is added (e.g. input method support) to a specific community or group of users (determined by geographic location, browser language settings, operating platform) can be used to determine impact of each feature. It may be difficult to measure the direct impact from improved language tools on increase in activity on any of the Wikimedia websites. However, there are other factors that can also influence this impact, such as -
 * Efforts by independent Wikipedians
 * "I love my language, so I started contributing to Wikipedia in it and organizing meetups in my city. I used to be the only one writing and now there are ten of us, and we wrote five hundred articles in three months."
 * Possibility 1: "We wouldn't be able to do it without Narayam."
 * Possibility 2: "I taught my friends how to configure their keyboards for our language. What is Narayam?"


 * Efforts by Chapters and WMF
 * "The conference in Mumbai brought 50 new contributors to the Marathi Wikipedia. The monthly number of Marathi Wikipedia readers grew by 20%."
 * Release of a large number of works in a language under a free license
 * "The government of Kazakhstan released the rights to the Great Kazakh Encyclopedia and uploaded all the articles to the Kazakh Wikipedia."

Metrics clusters
1. Tool preference settings


 * Many of our tools have preferences that can be set by logged-in users or cookies by anonymous users. Statistics on the user settings for these tools would be useful for impact measurement. There may be security and privacy dependencies in collecting such stats which may need advice from WMF legal.

2. Measuring user activity


 * 1) by edit frequency
 * 2) by reader usage patterns
 * 3) by user demographics (e.g. age)
 * 4) by time period (monthly, weekly)
 * 5) by default browser language
 * 6) by UI language
 * 7) by Geo location
 * 8) by content language

Metrics for Universal Language Selector

 * Measuring user experience
 * Time taken to locate language selection mechanisms.
 * E.g. Once a page loads, if the first action is to select the language, we can assume this time has been spent looking for the language selection mechanism.


 * Success rate for language change.
 * Number of times the user chooses a language with respect to the number of times ULS is clicked.
 * Time to find desired language
 * Once the selector is opened, time spent until the user finds the desired language.


 * UI Interface translations
 * Users changing the interface from their language to English. This can determine how confident is people on the quality of the local content and UI translations.

Metrics for Narayam

 * Number of users using Narayam
 * The number of people using Narayam and changes in these numbers over time


 * Default Input Method
 * Number of users who read/edit the language wiki project using default input method.
 * E.g. "I can type my language in Wikipedia without doing anything. I use Wikipedia to type my emails."


 * Disabled Narayam
 * Number of users who permanently disabled the input tool in user preferences.
 * E.g. “Bug 42128 – Disable Narayam for all projects in Chechen" – is it just an annoyed user or is really bad?


 * Non default input method
 * Number of users who used non default input method.
 * E.g. "I clear my cookies every now and then it's annoying to select a different layout in Wikipedia after that. This layout should be the default."

Number of users who used "other language" input tool
 * Using "Other language" input method
 * E.g. "I write articles about German cities and the German keyboard is very useful." – If such data exists, it is a good response to people who don’t like it.


 * Number of canceled edits (not saved).
 * Can represent problems creating content. The standard figure is very high, about 70 % (source: wikiHow Wikimania talks and Erik Zachte), so it needs comparison/context.


 * Number of reverted edits (saved and reverted).
 * Can represent problems creating content.

Measuring impact of Narayam on the quality of content editing
 * Qualitative measurement needed for:
 * E.g. "I am an admin in the Urdu Wikipedia. Since you installed Narayam, we do a lot more reverts. We even had to create an Abuse Filter rule to revert edits with common mistakes caused by Narayam." -


 * How long does it take to create content.
 * E.g. Measuring the ratio of content produced per time spent editing, we can determine if editing becomes easier with tools such as input method support. (Indirect metric analysis - average data)

Metrics for WebFonts

 * Number of users using WebFonts per time period and measuring change over time
 * Number of users who permanently disabled webfonts in user preferences
 * Number of users who used the reset option to disable webfonts
 * Number of users who change the default font
 * "This font is much more readable and should be the default font."

Metrics for Translate

 * Size of active translators group using Translate and their activity preferably combined for all Translate enabled wikis
 * Number of contributions/reviews/translators/reviewers to MediaWiki core L10n on TWN
 * Number of contributions/reviews/translators/reviewers to Wikimedia extensions on TWN
 * Number of contributions/reviews/translators/reviewers to all MediaWiki extensions on TWN
 * Number of contributors in a language (+ over time)
 * Languages with more than # contributions
 * Translators for the most spoken languages: https://translatewiki.net/wiki/Project:MediaWiki_localisation_in_the_50_most_spoken_languages
 * Usage of various translation helpers via the add link (down arrow)
 * Which translation helper is the most frequently used? TM, Microsoft, Apertium?
 * Page translation
 * Language translation administrators are interested in and are not getting translations for (probably via the Translation Notifications extension).
 * Percentage of new content namespace pages using PT
 * Pages views of translatable and translation pages
 * Not necessarily Translate, but related: Which messages are most often customized by the projects?

Metrics on technical and user documentation and built-in Help

 * Clicks on the help link for WebFonts, Narayam, Translate
 * For Narayam and Translate: To which help pages did the user eventually go? There are many of them.
 * Access to help links for language tools. Indicates that features are not intuitive enough.
 * In which language was the page actually read?
 * From which language did the user come?
 * Determine whether the features were used after the user consulted the help (sounds hard, but should be possible)

Metrics for Visual Editor which measure input method tool usage
This is not a tool that our team develops, but it has i18n characteristics such as the special characters insertion toolbar. It may be useful to know how is it used, because we sometimes fix bugs in it. It may also need a major overhaul of the Visual Editor. Finally, knowing which characters people need may influence the development of Narayam.


 * How many people open it and never actually insert characters?
 * Are there scripts that are never used? (It doesn’t mean that we have to remove them, because it probably doesn’t waste a lot of resources, but it may give some insights.)
 * Which characters are used most often? Which characters are never used?

Metrics on Wikimedia wikis configuration

 * Number of tool enabling requests over time
 * Collection and graph of these requests over time across number of wikis where the tool was deployed
 * Number of languages supported in TWN, MediaWiki, Wikimedia and Incubator over a specified time period
 * We don't directly deal with Incubator, but if languages pile up there, then something might be wrong. Maybe the Langcom should change its policies and maybe we can make better tools.

Qualitative metrics
Some metrics are hard to measure precisely and can only be measured by guessestimation.


 * Appreciation of genderised namespaces - can be achieved by interviewing language representatives.
 * Measuring user experience
 * E.g. surveys with editor feedback from language communities

Other metrics

 * Open bugzilla issues per maintained category
 * Number of e-mails on mediawiki-i18n mailing lists
 * Number of people using a site with a different interface language (preferences and uselang). Try to filter out developers.
 * A bit tricky: Number of issue reports by language and script
 * count requests the TWN Support page per language
 * count relevant bug reports (tracking bugs for such features would be useful. Language representatives can track manage them.)

External data

 * There is data that doesn't depend on our products, but is useful for our products' success. Some of it may be hard or impossible to get. Maybe we can get it from other organizations – chapters, Red Hat, Canonical, India office, mobile partners. For every language on which we focus, we need:
 * The number of people who know this language and know (or don't know) English
 * The number of people who use each operating system, including version and interface language (including phones and tablets)
 * The kind of keyboards that people use (for example, for India – how many have InScript and how many have only English)

Workflow Analysis
User steps have been captured for the use of Translate and Narayam extensions. For each, the critical points to measure success or failure have been identified (and metrics have been proposed).


 * Translate: http://commons.wikimedia.org/wiki/File:Workflow_for_the_Translate_Extension.pdf
 * Fluency of translations:
 * Time per length of messages translated.
 * Use of keyboard shortcuts


 * Inability to find messages:
 * Number of searches that produce no results.
 * Number of users that leave the search results without selecting a result.


 * Inability to understand the context of the message to translate:
 * Number of questions asked on the support page per translation message (clicks on the ask a question button).
 * Time to obtain a response in the support page/number of messages without a response.
 * Number of translations that are modified without the translation being updated (can we assume the modified translation was wrong or low quality?).


 * Lack of help during translation:
 * Number of messages without translation containing no information about the message.
 * Number of messages containing parameters ($1) that are not explained (info about the message does not include $1) [needs to expand templates].


 * Narayam: File:Workflow_for_Narayam_Extension.pdf
 * Inability to find input settings:
 * Canceled searches or edits


 * Inability to understand how o operate the tool:
 * Number of people disabling the tool (with and without using it)
 * Number of accesses to help information (with and without using it)


 * Web fonts:
 * Number of people modifying the default configuration.
 * Time measurements
 * Interaction time vs target size Fitts's law
 * Interaction time vs available options Hick's law

Esoteric/Tough

 * Quality of localization (based on existing localisations) and the effect on reader numbers and editor activity (Experts say this will not yield meaningful numbers)
 * Does localization take precedence over contributors/readers, or do contributors/readers take precedence over localization? In any case, current stats contain a localization score, but that should probably be compared within a time box (+/- 6 months), and then analyzed.
 * Number of strings in another script than default script for a language that is not in between lang tags – measure statistics on how this changes over time

These are tough to measure (or extremely resource intensive). Doing this measure on the web browser as a quality measure, using a Gadget, may be a Good Thing (tm) and are good projects for volunteer contributors.