Growth/Personalized first day/Structured tasks/Copyedit

This page describes work on a "copyedit" structured task, which is a type of structured task that the Growth team may offer through the newcomer homepage. This page contains major assets, designs, open questions, and decisions. Most incremental updates on progress will be posted on the general Growth team updates page, with some large or detailed updates posted here.

Current status

 * 2021-07-19: create project page and begin background research.
 * 2022-08-12: add initial research results.
 * Next: complete manual evaluation.

Summary
Structured tasks are meant to break down editing tasks into step-by-step workflows that make sense for newcomers and make sense on mobile devices. The Growth team believes that introducing these new kinds of editing workflows will allow more new people to begin participating on Wikipedia, some of whom will learn to do more substantial edits and get involved with their communities. After discussing the idea of structured tasks with communities, we decided to build the first structured task: "add a link".

Even as we built that first task, we have been thinking about what subsequent structured tasks could be; we want newcomers to have multiple task types to choose from so that they can find the ones that they like to do, and can increase in difficulty as they learn more. The second task we started working on was "add an image". But in our initial community discussions of the idea of structured tasks, the task type that communities desired most was a task around copyediting -- something related to spelling, grammar, punctuation, tone, etc. Here are our initial notes from looking into this and discussing with community members.

We know that there are many open questions around how this would work, many potential reasons that it might not go right: what kind of copyediting are we talking about? Just spelling, or something more? Is there any sort of algorithm that will work well across all languages? These questions are why we are hoping to hear from lots of community members and have an ongoing discussion as we decide how to proceed.

Goals

 * We want to understand the types of copyediting tasks it might be possible to assist with algorithms.
 * We want to use an algorithm that can suggest tasks for a type of copyediting in articles across different languages.
 * We want to know how good the algorithm works (e.g. know which model works best from a set of existing models).

Literature review

 * What different subtasks are considered copyediting?
 * Identify different aspects of copyediting across the spectrum: typo/spelling to grammar to style/tone
 * What are existing approaches to copyediting in Wikipedia?
 * Communities such as Guild of Copy Editors or the Typo Team.
 * Maintenance-templates such as the copyedit-template.
 * Tools such as the moss-tool to identify typos (also JarBot in Arabic Wikipedia)
 * What are existing public commonly-used tools for spell-checking/grammar etc such as [ http://hunspell.github.io/ hunspell], [ https://languagetool.org/ LanguageTool], or [ https://www.grammarly.com/ Grammarly]?
 * We know that our communities prefer transparent algorithms, so it is easy for everyone to understand where suggestions come from.
 * What are available models from research in NLP and ML, for example for the task of [ http://nlpprogress.com/english/grammatical_error_correction.html Grammatical Error Correction].

Defining the task

 * Which aspect of copyediting will we model for the structured task?
 * Type of task: spelling, grammar, tone/style
 * For example: What can browser-spellcheckers do?
 * Granularity -- highlighting task on the level of: article, section, paragraph, sentence, word, sub-word
 * Depends on the task
 * Surface known items (e.g. from templates) or predict new ones?
 * Only suggest that improvement is needed, or suggest how to improve?
 * Suggesting improvement is easier for simpler tasks.
 * Simply highlighting that work is needed is easier for more complex tasks (e.g. style or tone)
 * Language support: how many languages do we aim to support?
 * Include Spanish and Portuguese as target languages alongside Arabic, Vietnamese, Bengali, Czech.
 * We ideally want to cover all languages, but will realistically need to evaluate solutions based on the depth of their language coverage.

Building a dataset for evaluation

 * Generate a test-dataset (ideally in multiple languages) for the task for which we can compare different algorithms. This can be achieved in different ways
 * An existing benchmark dataset, such as [ https://aclanthology.org/W14-1701/ CoNLL-2014 Shared Task on Grammatical Error Correction], or approaches for corpora generation (from Wikipedia)
 * Generate our own dataset from revision history using templates (copyedit) or edit summaries (typo)
 * Manual evaluation of output of models run on a set of sentences from Wikipedia.

Research results
A full summary of Research is available on MetaWiki: Research:Copyediting as a structured task.

Literature Review
Background research and literature review are captured in Copyediting_as_a structured_task/Literature_Review

Main findings:
 * Simple spell- and grammar checkers such as LanguageTool or Enchant are most suitable for supporting copyediting across many languages and are open/free.
 * Some adaptation to the context of Wikipedia and structured task will be required in order to decrease the sensitivity of the models; common approaches are to ignore everything in quotes or text that is linked.
 * The challenge will be to develop a ground-truth dataset for backtesting. Likely, some manual evaluation will be needed.
 * Long-term: Develop a model to highlight sentences that require editing (without necessarily suggesting a correction) based on copyediting templates. This could provide a set of more challenging copyediting tasks compared to spellchecking.

LanguageTool
We have identified LanguageTool as a candidate to surface possible copyedits in articles because:
 * It is open, is being actively developed, and supports 30+ languages
 * The rule-based approach has the advantage that errors come with an explanation why they were highlighted and not just due to a high score from a ML-model. In addition, it provides functionality for adding custom rules by the community https://community.languagetool.org/
 * The copyedits from LanguageTool go beyond spellchecking of single words using a dictionary but also capture grammatical errors and style.

We can get a very rough approximation of how well LanguageTool works for detecting copyedits in Wikipedia articles by comparing the amount of errors in featured articles with those in articles containing a copyedit-template. We find that the performance is reasonable in many languages after applying a post-processing step in which we filter some of the errors from LanguageTool (e.g. those overlapping with links or bold text).

We also compared the performance of simple spellcheckers which are available for more languages than supported by LanguageTool. They can also surface many meaningful errors for copyediting but suffer from a much higher rate of false positives. This can be partially addressed by post-processing steps to filter the errors. Another disadvantage is that spellcheckers perform considerably worse than LanguageTool in suggesting the correct improvement for the error.

One potentially substantial improvement could be to develop a model which assigns a confidence score to the errors surfaced by LanguageTool/spellchecker. This would allow us to prioritize those errors for the structured task copyediting task for which we have a high confidence that they are true copyedits. Some initial thoughts are in T299245.

Read here for more details: Research:Copyediting as a structured task/LanguageTool

Evaluation
We are now working towards building a list with examples of automatically suggested copyedits for manual evaluation by the Growth Ambassadors T315086.