Content translation/Product Definition/analytics

ContentTranslation is an extension developed by the Wikimedia Foundation to help multilingual Wikipedia editors create pages. In order to understand the impact of Content Translation, some metrics are defined.

These measurements can be collected using EventLogging and other methods: analyzing Wikipedia dumps, direct queries of backend storage data, etc.

The idea of this document is to have a general idea of what do we need to measure, so that for each feature story we would be able to plan an appropriate EventLogging schema, write appropriate logging functions and have appropriate queries.



High priorities for product management

 * How often target and source languages are used. This is not immediately actionable, because the feature won’t be rolled out to all languages from the start, but will be very helpful for us to have a sense of what the main language pairs are. Analysing this on a per language basis allows also to identify the languages that have expanded more their number of articles thanks to the tool.
 * Technically: For the whole cluster, which source and target languages are used most often, per month.
 * How many users saved at least one draft translation. Take these as a cohort and:
 * Describe their CX activity: number of drafts created through CX.
 * How often is the draft edited by the users before moving to main namespace, and who edits it - the translator or other users.
 * Describe the draft creators’ overall editing activity within same time period.
 * Describe the draft creators’ cross-project behavior.
 * How many draft translations are created in each language.
 * How many pages are eventually moved to the main namespace as real articles.
 * What editing activity was done before moving the article to the main namespace.
 * How many people click the red link.
 * Of the people who click the link, how many people accept the invitation and go on to translate.

Other metrics for created articles
The main goal of Content Translation is to increase the content available in all languages. New articles created with the tool will be the main element to observe.

Quantity of content

 * Number of articles created. Articles created per week, per user, per language.
 * Length of articles. This gives an idea of the kind of articles produced. It can be useful to compare to the length of the original article (e.g., "users translate only 30% of the original article on average").
 * Links to/from articles. Articles that include links to other articles may indicate more complete articles. New articles that become linked are also a sign of being considered usable articles.
 * Time spent in creating a translation (per paragraph). How fast can translators produce content?

Impact and quality

 * Number of readers on the original article. Does the availability of an article in other languages reduce the readers of the original one?
 * Number of readers for new articles. How many people are accessing the new content.
 * Amount of machine translation.
 * Translations vs. regular edits per user. How many users are only contributing as translators? Are prolific editors adopting translations or users with fewer regular edits?
 * Number of translators (users who have created a translation).
 * Number of articles that translate an existing article (possibly trying to add a new paragraph to an existing translation).

Evolution in time
Deletion rate. How many articles produced by the tool are deleted by the community. It will be useful to correlate the deleted articles with other metrics (article length, amount of automatic translation, editing expertise of the user).

General technical principles

 * Don’t log everything. Log only what’s useful for product metrics.
 * Try to make several schemas, one per feature or so. This would be easier to query and it may also help avoid changes that would break continuity (a new table is created every time we change a schema version).
 * Server-side logging doesn't directly reflect user actions, because we'll do a lot of caching and pre-fetching.
 * We are interested in moving between paragraphs and segments, because it relates to caching.
 * If we use VE, then changes in the Document Model (in the browser) can be logged easily (they're already stored for undo purposes). This generally includes user selection events. The VE DataModel (DM) contains a representation of the cursor selection.

Tagging
To support the measuring of the metrics above, the articles created by Content Translation will be tagged. Tags will allow to (directly or indirectly) identify the following context information:
 * The article was created by Content Translation.
 * The language of the translation.
 * The source article used for the translation.

Translation center

 * Dashboard usage: Coming back to complete articles, leaving articles half baked, etc.

Translation editing UI

 * Buttons and links in translation view
 * Clear translation.
 * Paste source text.
 * "view article" (trivial, but we may find that nobody uses it)
 * Which types of interactive segments are used (suggestions, links, plain segments, templates, etc.) are the users accessing.
 * Time spent on the page: seconds per paragraph, word, article.
 * This can later be used to tell people something like “It will take 20 minutes to translate this article.”
 * It’s easy to track clicks, but we may also want something more complicated, like logging of selecting and deleting everything.

Machine translation “abuse”

 * Does the user’s behavior change after warning about too much machine translation.
 * How many users are shown the MT abuse warning?
 * Which percentage of machine translation did the article contain when the warning was shown? Which percentage of MT did the article contain when it was later published?

Automatic link insertion

 * How often each source is automatically chosen for the guessed link target:
 * Wikidata sitelinks
 * Wikidata labels
 * Wikidata aliases
 * manual interlanguage links
 * machine translation
 * Dictionary
 * How often do people choose a different source manually
 * How does the (non-)availability of certain sources affect this choice
 * How often do people remove them (similar to measuring content change above)

Dictionaries

 * How often is each dictionary used for each language

Entry points

 * How often is each entry point used?
 * How often do people create articles from scratch as opposed to translating using CX?

Other

 * Which types of articles are translated? By category, by type (for example - geography, mathematics, biography, history) (ref: http://lists.wikimedia.org/pipermail/wiki-research-l/2014-March/003335.html )