Growth/Personalized first day/Structured tasks/Copyedit/ja

This page describes work on a "copyedit" structured task, which is a type of structured task that the Growth team may offer through the newcomer homepage. This page contains major assets, designs, open questions, and decisions. Most incremental updates on progress will be posted on the general Growth team updates page, with some large or detailed updates posted here.

現状

 * 2021-07-19: create project page and begin background research.
 * Next: continue background research

要約
Structured tasks are meant to break down editing tasks into step-by-step workflows that make sense for newcomers and make sense on mobile devices. The Growth team believes that introducing these new kinds of editing workflows will allow more new people to begin participating on Wikipedia, some of whom will learn to do more substantial edits and get involved with their communities. After discussing the idea of structured tasks with communities, we decided to build the first structured task: "add a link".

Even as we built that first task, we have been thinking about what subsequent structured tasks could be; we want newcomers to have multiple task types to choose from so that they can find the ones that they like to do, and can increase in difficulty as they learn more. 作業中のタスクの2番目は、「画像の追加」です. しかしながら、構造化タスクに関するコミュニティとの初期の協議では、コミュニティがもっとも 要望するタスクとは文の編集 -- スペルや用字、文法や句読点、文の口調などでした. この件を検討した当初、コミュニティの皆さんとの協議をこちらの初期のメモにまとめてあります.

これがどのように有効になるか、まだ未対応の質問がたくさんあること、うまくいかないという予測には複数の理由があることを承知しています. では、ここで言う文の編集とは、具体的にどんなものでしょうか？ 訳 スペルや用字のエラーか、それ以上か？ 対象言語が何でもうまく作動するアルゴリズムは既にあるかどうか？ これらの質問があるからこそ、広くコミュニティの皆さんから意見をお聞きして、プロセスの決定段階と並行して協議を 続けたいと考えます.

目標

 * We want to understand the types of copyediting tasks it might be possible to assist with algorithms.
 * We want to use an algorithm that can suggest tasks for a type of copyediting in articles across different languages.
 * We want to know how good the algorithm works (e.g. know which model works best from a set of existing models).

Literature review

 * What different subtasks are considered copyediting?
 * Identify different aspects of copyediting across the spectrum: typo/spelling to grammar to style/tone
 * What are existing approaches to copyediting in Wikipedia?
 * Communities such as Guild of Copy Editors or the Typo Team.
 * Maintenance-templates such as the copyedit-template.
 * Tools such as the moss-tool to identify typos (also JarBot in Arabic Wikipedia)
 * What are existing public commonly-used tools for spell-checking/grammar etc such as hunspell, LanguageTool, or Grammarly?
 * We know that our communities prefer transparent algorithms, so it is easy for everyone to understand where suggestions come from.
 * What are available models from research in NLP and ML, for example for the task of Grammatical Error Correction.

タスクの定義

 * Which aspect of copyediting will we model for the structured task?
 * Type of task: spelling, grammar, tone/style
 * For example: What can browser-spellcheckers do?
 * Granularity -- highlighting task on the level of: article, section, paragraph, sentence, word, sub-word
 * Depends on the task
 * Surface known items (e.g. from templates) or predict new ones?
 * Only suggest that improvement is needed, or suggest how to improve?
 * Suggesting improvement is easier for simpler tasks.
 * Simply highlighting that work is needed is easier for more complex tasks (e.g. style or tone)
 * Language support: how many languages do we aim to support?
 * Include Spanish and Portuguese as target languages alongside Arabic, Vietnamese, Bengali, Czech.
 * We ideally want to cover all languages, but will realistically need to evaluate solutions based on the depth of their language coverage.

評価用のデータセットを構築

 * 特定のタスクについてテスト用データセットの構築（できるだけ複数言語で実施）により、異なるアルゴリズムの比較対照に使えるようにします. さまざまな実践の取り組み方
 * 既存のベンチマーク用のデータセット、たとえばCoNLL-2014 文法エラー修正の共同タスクまたはコーパス集作成という取り組み（対象はウィキペディア）
 * 変更履歴から独自のデータセットを作成するには、テンプレート（文の編集）または編集要約（誤字）を使う
 * ウィキペディアから抽出した文章を使って実施し、出力モデルケースを手動で評価した結果