JADE/Scalability FAQ

These are thoughts about potential scalability problems with JADE, an extension that allows editors to submit editorial judgments about wiki pages, revisions, and diffs.

What is JADE?

JADE, the Judgment and Dialogue Engine, is a new extension to MediaWiki. JADE provides a new namespace called "Judgment" that stores annotations to the wiki's content. Annotations are stored on these pages in a machine-readable format called JSON. These machine-readable annotations can be used to provide feedback to automated systems like ORES, for example by challenging the assertions that ORES makes. Users can edit Judgment-namespace pages directly, or they may interact with them indirectly using tools, including those used in counter-vandalism work.

What is ORES?

ORES is a machine learning service run by the Wikimedia Foundation which makes extremely fast predictions on matters such as a given article's quality or whether a given edit to a page damages that page. It helps make the work of human reviewers more efficient by providing data which helps those reviewers triage their work.

ORES is not JADE, and JADE is not ORES. However, human contributions made through JADE can be used to help make ORES better.

What is being judged?

In the first phase of deployment, JADE will be used to judge wiki pages, revisions (individual versions of pages), and diffs (differences between revisions). Later, we'll want to judge other entities such as admin actions (using log entries as a proxy), users, and more. Each entity type can be judged according to several quantitative schemas, for example the "Wikipedia 1.0" assessment scale.

The quantitative scales in JADE closely mirror ORES predictions, making the data easy to feedback into AI training, but we'll also be studying the full range of rich expression afforded by free text and talk pages.

What kind of annotations are being made?

These annotations are necessarily of a subjective nature, including judgments on matters such as the quality of a given page or whether a given diff is damaging to a page. For that reason, Judgment-namespace pages should not be used to store data from automated sources like ORES.

How are these annotations stored?

In the Judgment namespace, annotations for a given page, revision, or diff are stored on a single wiki page. For example, the annotations concerning the page with an ID of 123 would be at Judgment:Page/123. The annotations concerning revision 456 would be stored at Judgment:Revision/456. The annotations concerning the difference between revision 456 and its parent revision would be stored at Judgment:Diff/456.

Why did you implement JADE like this?

Our architectural goal is to make Judgment pages match other wiki pages as much as possible. This allows their integration into important workflows such as vandalism reversion, deletion and suppression. If we instead pursued implementation in more novel ways, we would be require to expend additional engineering effort on re-inventing those workflows.

Note that these requirements must be satisfied by any alternative solution we might use.


 * Collaboration.  We encourage editing judgments, by the original author or others, for any of the usual wiki reasons: copyediting, elaboration, changing and reversing in a Bold-Revert-Delete cycle, and so on.


 * Dialogue.  An important part of this collaboration will be the discussion and consensus processes.  This communication is probably going to be disjoint, focused on the change made by a specific revision for example, or the overall topical categorization of the article, but not both in the same thread.


 * Suppression.  Free-form text must be patrolled, vandalism and libel must be reverted.  Edits must appear in Special:RecentChanges and integrate with AbuseFilter.


 * Auditability.  We will analyze the changes to judgments over time.  We'll need the history of changes to the judgment for suppression, if nothing else.


 * Analysis.  Judgments can be dumped to a flat format convenient for third-parties.


 * EventStream integration. Tool developers can track judgments in real-time.

These requirements were partly informed by https://www.mediawiki.org/wiki/Everything_is_a_wiki_page and https://en.wikipedia.org/wiki/User:Risker/Risker%27s_checklist_for_content-creation_extensions. For more about why these guidelines matter, see https://meta.wikimedia.org/wiki/Trust_and_Safety who will be asking us to suppress user-generated content from raw database tables by hand unless we can integrate with MediaWiki curation.

Can we use multi-content revisions?

Currently, we're choosing to avoid MCR because:


 * Judgments will be added after the fact, so revision judgments will be separated from the revision being judged.


 * MCR is not ready to deploy and is blocked on capabilities we require, such as integration with AbuseFilter.


 * Using MCR wouldn't have any impact on the revision and page metadata size.  All we're doing is shifting content to a new table, but text table data length is not one of the concerns here.

Otherwise, what would make MCR a good fit is,


 * There are a finite number of judgment entity types and schemas, so mapping either types or schemas to slots would be intuitive and efficient.

Can we save space by combining all judgments about a page into a single JADE page?

This is tempting at first sight, but:


 * JADE page size will grow with the number of revisions in the page.  Revisions are saved wholesale in content storage.  Pages with multiple judgements will be edited more.  Therefore, content size, traffic and storage will be substantially higher for these records and much of it will be wasted in the common case of working with judgments on a single revision.


 * We save on the number of pages created but there is no reduction in the number of revisions made.


 * This trick only works to compress revisions, and doesn't have an easy analogue for other entity types being judged.


 * Storing multiple, unrelated items in one page is an antipattern.  Two revisions on a page are only indirectly related.


 * This doesn't reduce the number of revisions being made, only the number of pages.


 * Conflicts are more likely as we make the pages longer.


 * UI will need to be developed to keep things straightforward for end-users and tool developers.

'''Can judgments be stored in an isolated database? / Can judgments be stored on a separate, central wiki?'''

Definitely a possibility. However, the benefits of storing in the same wiki as the entities being judged are that we have the same editor community, writing in the same language. If a user is locally blocked from a wiki, they won't be able to participate in judgment of that wiki, which means reduced administrative burden.

Likewise, the main *disadvantage* of using a central wiki is the forced social separation it incurs. Rather than Dutch Wiktionary posting its judgments to Dutch Wiktionary, Dutch Wiktionary would be surrendering its reviews to a different wiki with its own administrative structures and social norms. This would be comparable to the current controversy between certain wiki communities and Wikidata, the former viewing the latter as unreliable, prone to vandalism, and out of compliance with higher quality standards. While Wikidata has valid reasons to pursue centralization in spite of the costs, JADE would gain nothing, as the content this extension produces is inherently per-wiki.

Another way of implementing the storage isolation would be to use a custom table rather than wiki page storage, but I consider that a deal-breaker or serious obstacle at least.

"Everything is a wiki page" is a huge limitation on large wikis. We won't want to delete the storage used for example if one day we migrate away from wiki pages. If we can't isolate the storage, we'll need to limit volume to revision scales.

How will we control JADE's growth?

There are several approaches we will take.

1. Staged deployment to various integration points (Huggle, New Pages Patrol, etc.)

First of all, we're conducting user testing with the help of Daisy Chen and Prateek Saxena in order to validate and explore potential use cases. Then, we're planning to integrate with existing workflows by storing judgments that are already being made into our JADE storage. The workflows we're looking at are currently New Pages Patrol, Recent Changes Patrol, FlaggedRevs, and PageTriage.

For each of these workflows, we'll be able to set up integration that can be controlled by configuration, per-wiki. We'll turn on a single workflow for a single wiki, analyze the impact, and iterate. During this transparent integration phase, we'll be able to revert any configuration as needed. In a later phase, we may introduce a dedicated JADE workflow on-wiki, or we may augment existing workflows to take advantage of JADE capabilities, for example collecting a freeform text comment when patrolling pages.

In the event that one of these integrations has to be disabled, the procedure will be to notify the community and deploy the configuration change to disable that integration. The effect will be that new data will stop flowing in from that workflow, which does not prevent the workflow from continuing with its legacy stores, but does have an impact on data consumers. During the first phase, integrated workflows with be "soft" coupled, so they gracefully degrade to not duplicate data into JADE, transparently to the end user. In the future, an outage will be more significant because software will rely on incoming data and might be unusable or useless without JADE.

2. Limit editing by user right

MediaWiki provides the functionality to limit edits to various namespaces by user right. On large Wikis "rollback" functionality is limited to those within the "rollbacker" or "sysop" user groups. Similarly, New Pages Feed only works for users in the "patrollers" group. We will limit editing of the JADE namespace to users in these groups to start off. Through working with these users, we'll learn their work patterns and decide when to open up contribution for less privileged users based on empirical data.

3. Bot supervision and blocking

Ordinary community mechanisms of managing bot privileges and behavior. Extension deployment will come with a phase of community discussion, in which we explain why bots are not welcome to spam this namespace with automated predictions. If the community disagrees on this point, we can pause deployment until a resolution.

In the event that a bot goes haywire and begins to add JADE content such as automated predictions, it can be blocked temporarily.

How much storage do we need?

Our limiting factor is the human labor time needed to create judgments. We can use the sum of all existing review workflow volumes as an upper bound, which is around 1% of total edits. This gives us a maximum of c. 500,000 annual JADE actions on a large wiki such as English Wikipedia, or 4,000,000 annual JADE actions over all wikis combined. We have fairly granular control over the rate of growth as explained above, so this volume won't be reached for several years. We're able to keep growth within whatever limits are dictated by the available resources.

Is this scaling with revisions?

No.

Mathematically, it's possible to create a judgment for every revision, and even create loops where judgments are being judged themselve. Luckily, the human labor limit mentioned above will come into play long before we reach scaling proportional to revisions. The risks are the same as caused by the potential for editors to make multiple talk page edits for every content page edit—the capability is afforded by software when needed, but editors in aggregate don't have the energy to ever do such a thing.

The potential scenario of judging a judgment is also fine, an intended use case even. Since only 1% of edits are reviewed, we can assume that the same ratio will hold for judgments themselves, therefore out of every 10,000 judgments there will be 100 judgments of judgments, and 1 judgment of a judgment of a judgment. This is clearly a vanishing term as we go towards higher-order judgments.

Aren't we planning to record bot judgments?

We're focuing on human judgment. Bot or other types of heuristic judgments are much lower quality and not particularly interesting for our purposes. In the very long term (several years), we can accept this data since JADE is a logical place to store it, but this is a stretch goal.

What about rogue bots?

One of the main concerns that has been raised is that we're creating a potential crime of opportunity: a vast new namespace which invites massive, meaningless contributions where every new revision gets judged by bots.

The plan is that rogue editing that goes against suggestions will be flagged for blocking. Normal rate-limiting should limit the damage until stronger measures can be taken.

Bots and their activities are highly regulated -- especially in big Wikis. When they run out of control, the event is short lived. See https://commons.wikimedia.org/wiki/File:ORES_-_Facilitating_re-mediation_of_Wikipedia%27s_socio-technical_problems.pdf for a comprehensive review.

How are schema migrations handled?

The JSON content schema for the Judgment namespace, the public API and the PHP API will all be stable interfaces. We'll follow https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy, so on the occasions that we can't provide a transparent migration or permanent access using an old API, we'll notify stakeholders of the breaking change well in advance and will help migrate legacy usages and content. Unfortunately, we might need to take downtime for some of the more extreme migrations, in this scenario.

Another possibility is that breaking changes will be handled by introducing a new ContentModel, but it's not a very satisfactory thought.

What if we're wrong and need to migrate away from wiki pages entirely?

This is the scariest of the disaster scenarios, because wiki page storage is not meant to be deleted, for example. If we migrate away, we never get to reclaim the storage and leave a big mess behind us.

What is the long-term future of JADE storage?

We may decide to migrate to "structured storage" once it's mature, assuming it can support our requirements above. Our ideal is to have a native, structured store where analytical queries are able to access the judgment fields, but with all the affordances of wiki pages. Word on the street is that Marko Obrovac et al. are working on exactly this.

See also

https://phabricator.wikimedia.org/T196547 - Prior discussion of scalability concerns

https://www.mediawiki.org/wiki/Extension:JADE - Current implementation