Content translation/Deletion statistics comparison

Data on the deletion ratio of Wikipedia articles created with and without Content Translation.

Wikis with higher deletion ratios for CX created articles
We reviewed wikis where the deletion rate of articles created with content translation is higher than the deletion rate for articles created with other tools as part of T286636 during a specified timeframe. This data is updated quarterly (every three months) to assess the evolution of deletion rates as improvements are made. This timeframe was selected to capture a sufficient time for editors to review content and limit seasonality effects.

Data comes from mediawiki_history and reflects the deletion ratios of main namespace articles that were created using Content Translation compared to the deletion ratios of main namespace articles created without the tool. Bots were excluded. We also removed wikis where 15 or fewer articles were created with content translation during the reviewed timeframe to reduce noise in the data and focus on wikis with more representative data.

Monthly deletion ratios for representative wikis (Jan 2016 - Jan 2019)
Monthly data about the deletion of articles created with and without Content Translation on several Wikipedias was prepared as part of T215397. This data is for whole months, not quarters, from January 2016 until January 2019.

The results can be found in a Google spreadsheet.

This spreadsheet is publicly shared and can be filtered, copied, etc. Note that only pages in the main namespace are counted. This may lead to discrepancies between this data and the data at Special:CXStats, which includes all namespaces.

The query
To examine the queries used to created and to run this yourself, see query 53775 in Quarry. To look at different languages and dates, replace the database name and the timestamp value.

The use of the CX deletion statistics comparison data
One of the ways the WMF Language team uses this data is to determine when to adjust the machine translation limit in the Wikis to enforce the review and modification of initial machine translation before articles are published to encourage quality translations. Below are some criteria for changing Machine Translation limits in the tool when there is a high deletion of CX articles in any Wikipedia.

{| class="wikitable"
 * SN
 * Instance/case
 * Problem

(check Content translation/Deletion statistics comparison to determine this)
 * Frequency
 * Possible causes

(To be populated based on findings, experiences and feedback)
 * Action

(This column can change to iterate learnings)
 * rowspan="4" |
 * rowspan="4" |
 * rowspan="4" |

Translations deleted

 * rowspan="2" |More than 50% of articles are deleted. (Please note that the 50% applies when there are more than ten translated articles in a quarter or more; it will not apply if the articles created are less than 10)
 * Occurs in one quarter
 * Investigate
 * Occurs consecutively in two quarters (5 to 6months)
 * Take action based on findings from earlier investigation. If no tangible findings, make Machine Translation (MT) limit more strict by 5% and monitor for changes.
 * rowspan="2" |More than 75% of articles are deleted.
 * Take action based on findings from earlier investigation. If no tangible findings, make Machine Translation (MT) limit more strict by 5% and monitor for changes.
 * rowspan="2" |More than 75% of articles are deleted.
 * rowspan="2" |More than 75% of articles are deleted.
 * rowspan="2" |More than 75% of articles are deleted.

(Please note that the 50% applies when there are more than ten translated articles in a quarter or more; it will not apply if the articles created are less than 10)
 * Occurs in one quarter.
 * Make MT limit more strict by 10% and monitor changes
 * Extends to another quarter.
 * Community Consultation required to:
 * Extends to another quarter.
 * Community Consultation required to:
 * Community Consultation required to:


 * Gather samples of problematic translations to increase Machine translation based on logic.
 * Understand if there are underlying issues with the tool.
 * rowspan="4" |
 * rowspan="4" |
 * rowspan="4" |

Difference in deletions: translations (CX) vs. new articles (non-CX)

 * rowspan="2" |The difference between the two deletion ratio (non-CX and CX)  is above -50%
 * Occurs in a quarter
 * Make MT limit more strict by 5% and monitor changes.
 * Extends to another quarter
 * Make MT limit more strict by another 5% and monitor changes.
 * rowspan="2" |The difference between the two deletion ratio (non-CX and CX)  is above -80%
 * Occurs in a quarter.
 * Make MT more strict by 10% and monitor changes.
 * Extends to another quarter
 * Community Consultation required to:
 * Occurs in a quarter.
 * Make MT more strict by 10% and monitor changes.
 * Extends to another quarter
 * Community Consultation required to:
 * Extends to another quarter
 * Community Consultation required to:
 * Community Consultation required to:


 * Gather samples of problematic translations to increase Machine translation based on logic.
 * Understand if there are underlying issues with the tool.
 * rowspan="2" |
 * rowspan="2" |
 * rowspan="2" |

Translations created

 * rowspan="2" |If articles created with CX decrease by 90% from previous month
 * First quarter
 * Investigate, check for technical issues.
 * Three months consecutively
 * Community consultation, from the outcome, determine the next step.
 * Three months consecutively
 * Community consultation, from the outcome, determine the next step.
 * Community consultation, from the outcome, determine the next step.

The next step can be making the MT limit less strict by 10% and monitor changes.
 * }