JADE/Implementations

This page describes a set of thoughts about implementations of JADE. See T166053 for the tracking task.

System requirements

 * REST HTTP API
 * Accessible from the public Internet.
 * Allows all relevant operations (save/update judgement, suppress comment, etc.)
 * Integrates with MediaWiki authentication (OAuth2). Unprivileged operations don't require credentials, privileged ones require appropriate onwiki privileges.
 * Queryable in some basic ways (e.g. give me all judgements for an artifact)
 * Note that the API will only be able to access artifacts by their primary ID, we cannot join e.g. on page_id to get all judged revisions of that page.
 * Operations are checked for validity before we respond. If the user is disallowed, the entity doesn't exist, or there's another reason preventing this action, the HTTP response will indicate failure.
 * Operations are atomic. If the HTTP response indicates success, then the client can safely assume that the operation will eventually complete successfully.  This "eventual" caveat is because our event-passing takes some time to propagate.
 * Public query-ability
 * Users are able to run SQL (or similarly easy query language) that joins judgements with production database tables with the wiki entities being judged.
 * Queries can be run from the public Quarry interface.
 * JADE comments can be joined and full-text searched.
 * Full and partial dumps available for internal and external use
 * WMF will maintain a full, private dump of every action JADE has recorded.
 * Dumps will be made available to the public, which have suppressed content redacted. Something will have to be done about future, retroactive suppressions.
 * Curation activities
 * Full integration with MediaWiki curation tools.
 * Comments and judgments appear in Special:RecentChanges in an obvious and usable way.
 * Comments and judgments appear in Special:Contributions.
 * Comments and judgments appear on Special:Watchlist when the judged wiki entity is being watched.
 * Blocked users cannot submit any content to wikis where they are blocked.
 * Extension:AbuseFilter integration.
 * When a user account is suppressed or renamed, their judgments are modified accordingly.
 * Users (which privileges?) will be able to suppress comments and usernames that are offensive or violate privacy.
 * We cannot add significantly to the existing review load, currently c. 350-500 suppressions per month.
 * Event-based, can easily add consumers
 * The feed of JADE actions will be available, through a stable event-based interface. Consumers will be able to catch up with any missed messages, and don't need to be constantly online to capture the entire feed.
 * Third-party consumers will need to agree to our redaction terms, so that suppressions are propagated.
 * Judgments can include quantitative scores on multiple optional "schemas", and freeform comments.
 * Schemas look like "Is this good faith?": "good faith" : "bad faith". They are versioned.
 * We support discussion threads, they are rooted at a wiki entity and may refer to specific judgments or specific scores.
 * We won't be implementing our own discussion threading, but will rely on an existing platform such as Structured Discussions.
 * It will be possible to navigate back from the discussion to the wiki entity and judgments.
 * Judgments will target wiki entities "edit" (revision as a change) and "version" (revision as snapshot of a page) separately, with different schemas available for each type.
 * A given wiki entity will have at most one "preferred" JADE judgment.
 * Editors can BOLD-Revert-Discuss to collaboratively set the "endorsed" judgment.
 * There's some open discussion at "Talk/JADE#Judgment Bold-Revert-Discuss vs. optional consensus"
 * Any UI for JADE will be fully translatable and language will be drawn from a shared store.
 * Metrics will be made available for JADE activity per-wiki, and for system monitoring.

Auditing

 * Reporting problems with ORES
 * User sees an incorrect label, responds by creating a new judgment on that artifact.
 * Grounded theory
 * Researcher makes queries using the API, makes SQL queries, or processes the raw data dump.
 * Effective refutation of ORES predictions
 * User can see other users' judgments. They are first-class in the API, and presented alongside ORES scores.
 * Showing improvements in ORES
 * Researcher uses JADE to analyze ORES's performance.

Label storage

 * Judgments can be used as our label store
 * Wiki Labels reads from and writes to the JADE store.
 * Researcher can upload lists of artifacts needing judgments.
 * Improve labels by directing attention to borderline cases
 * Researcher identifies borderline judgments.
 * Wiki Labels presents chosen artifacts for judgment.

Implementation strategies
Our initial strategy was to install the JADE extension on each wiki using JADE, and to store judgments as wiki pages. We've been warned about scalability constraints when using wiki pages for judgment storage, so there are two alternatives we'll explore. The first is to store judgment actions as events in the log table, and the second is to sandbox ourselves onto a central wiki. Judgments as a log allows us to use a different table, which allows page and revision actions to be unaffected by our activity. A central wiki lets us contain the damage if wiki pages grow unmanageably.

Per-wiki
With per-wiki deployment the two options are to create judgments as pages and to create them as log entries. Both options offer significant re-use of MediaWiki components, which creates a better experience for the users and less engineering work overall. The details of these two proposals are outlined below.

Each solution offers its own advantages and disadvantages relative to the other one. Pages offer much more expressive content (entire JSON blobs vs. log entry summaries), the ability to revise judgments after the fact (including to designate one as a "preferred" judgment), and the ability to quickly roll back many judgments as needed. However, page revisions would be left behind on the wiki (with no ready ability to purge them all) should we choose to remove the extension, and some UI work would be needed so that wiki users would not need to interact with raw JSON. With judgments as log entries, we avoid the need for a new namespace and pages to maintain and we gain the ability to purge the logs en masse (if needed) but would need special API and UI affordances for listing judgments, designating a preferred judgment, and deprecating old ones.

Pages
With this deployment approach, JADE is deployed as an extension to each individual wiki using it, enabling the Judgment namespace on that wiki. This is the approach preferred by the Scoring Platform Team for the following reasons:


 * If we instead pursued implementation in more novel ways, we would be require to expend additional engineering effort on re-inventing wiki workflows.


 * We encourage editing judgments, by the original author or others, for any of the usual wiki reasons: copyediting, elaboration, changing and reversing in a Bold-Revert-Delete cycle, and so on.


 * An important part of this collaboration will be the discussion and consensus processes.  This communication is probably going to be disjoint, focused on the change made by a specific revision for example, or the overall topical categorization of the article, but not both in the same thread.


 * Free-form text must be patrolled, vandalism and libel must be reverted, and occasionally pages may need to be deleted and suppressed.  Edits must appear in Special:RecentChanges and integrate with AbuseFilter.


 * We need to be able to analyze the changes to judgments over time, and we need basic version control functionality for quality control purposes.


 * We need to be able to dump judgments to a flat format convenient for third-parties.


 * Integration into EventStream is necessary, so tool developers can track judgments in real-time.
 * This gives the community the most control over content produced through the extension. This, in turn, makes it more likely for JADE to be accepted by those communities.
 * This is also logically consistent with the purpose of the extension: to create annotations of content and activity on a given wiki.
 * We avoid some of the social complications that a central wiki would pose; see below.

These requirements were partly informed by https://www.mediawiki.org/wiki/Everything_is_a_wiki_page and https://en.wikipedia.org/wiki/User:Risker/Risker's_checklist_for_content-creation_extensions. For more about why these guidelines matter, see https://meta.wikimedia.org/wiki/Trust_and_Safety who will be asking us to suppress user-generated content from raw database tables by hand unless we can integrate with MediaWiki curation.

To mitigate against the risk of runaway growth, and as a general good practice, the Scoring Platform Team would deploy starting with smaller wikis and gradually move up to larger wikis. We would start with one wiki and gradually branch out according to what we learn fromthat wiki's usage of JADE. Initially, we would limit ourselves to the set of wikis that have ORES deployed. These are wikis who are already familiar with our work and therefore more likely to understand the role of JADE. They are also wikis that stand to benefit the most from JADE, since JADE will be used to provide feedback for ORES. For performance reasons, we will avoid deploying to wikis that have revision tables that are either over 100 GB in size or approach that limit. This rules out English Wikipedia and Wikidata and makes us wary of deploying to German and French Wikipedias, at least until long-term improvements are made to the revision table.

For the list of which wikis we are considering, see this table.

We would work with the pilot wiki's community to adapt their tools to integrate into JADE. For example, countervandalism tools should report back judgments made by human reviewers so that the same edit need not be reviewed twice. We would encourage one integration to be deployed at a time, making sure that growth levels remain acceptable. (What we need is for Ops to provide a threshold for "acceptable" growth.) If growth occurs at an acceptable rate, we can continue deploying new integrations to wiki tools and deploying JADE to new communities. We would choose new wikis to deploy to based on support for tools and general demand for JADE, while preferring small wikis to larger ones. Generally, it is up to that wiki's community to determine what exact role JADE will have on that wiki, and editors on that given wiki would be free to devise their own integrations.

Our deployment strategy is designed for controlled growth. However, if it is determined that the rate of growth poses a threat to Wikimedia's infrastructure, the following actions will be taken:


 * We would cancel future deployments of JADE if any are scheduled.
 * The community would be informed that JADE is being discontinued, and a timeline would be set for shutting off the extension.
 * We would work with tool developers to remove JADE integration from their tools.
 * Existing Judgment namespace pages would be converted into regular JSON pages. We believe it would be up to the community to decide what to do with existing Judgment pages and the Judgment namespace in general. (Options include deleting all of the pages or moving them to an archive of sorts.)
 * The extension would be uninstalled or switched to a functionally equivalent "disabled" mode.

Log entries
With this strategy, JADE would be deployed to each wiki, but instead of creating a new namespace, judgments would be a kind of log entry. Log entries nicely match the format of judgments ("so-and-so assessed revision 12345 as damaging, bad-faith") and log entries can be deleted and suppressed as needed. Log entries cannot be edited in place but new ones can be created as needed for updated judgments. From an operational perspective it is easier to purge log entries than revisions.

The main downside is that there is no obvious user interface here. With judgments as revisions, we would be able to make use of the wiki editing interface. With judgments as log entries, we would need to build a custom UI for creating such log entries. Such an approach may also result in poor AbuseFilter integration, since AbuseFilter isn't designed to screen against log entries typically.

It is also worth noting that this removes the ability for users to interact with the raw JSON, which is probably good from a usability perspective.

This can be combined with custom table implementations and make the data query-able.

Central wiki
The alternative option is to install the JADE extension on one wiki, jade.wikimedia.org or a similar address. This will allow all wikis to benefit from JADE immediately, which could lead to faster growth than anticipated. It would also avoid contributing to revision table growth problems anticipated by Site Reliability Engineering. It can also help develop a community of practice around JADE by encouraging the different users of JADE to work together.

The main disadvantage of using a central wiki is the forced social separation it incurs. Rather than Dutch Wiktionary posting its judgments to Dutch Wiktionary, Dutch Wiktionary would be surrendering its reviews to a different wiki with its own administrative structures and social norms. This would be comparable to the current controversy between certain wiki communities and Wikidata, the former viewing the latter as unreliable, prone to vandalism, and out of compliance with higher quality standards. While Wikidata has valid reasons to pursue centralization in spite of the costs, JADE would gain nothing, as the content this extension produces is inherently per-wiki. the benefits of storing in the same wiki as the entities being judged are that we have the same editor community, writing in the same language. If a user is locally blocked from a wiki, they won't be able to participate in judgment of that wiki, which means reduced administrative burden. Judgments are specific to a target wiki entity; there is no natural sharing across wikis like we see for Commons or Wikidata. In some ways, you can think of the judgments as a structured talk page. This makes the central wiki a dubious conceptual fit.

The following software changes would need to be made to JADE:


 * We would need to ensure that the JADE wiki has the software tools it needs for effective self-policing.
 * We would need to build out functionality for judgment pages to show up on the recent changes and watchlist feeds for the affected wiki and to otherwise make judgments show up in the interface of that wiki, possibly through a JADE-Client extension. This would give us the benefits of per-wiki integration but requires significant design, product, and engineering efforts.
 * JADE would need to change its page title structure to distinguish between the different wikis. For example, "Judgment:Revision/12345" would need to change to "Judgment:Revision/enwiki/12345" or something similar.
 * Namespace and title would be in a single language, there's probably no easy way to accomplish i18n.  Judgment text and edit summary would probably be in different languages, following the target wiki language.  Otherwise, they would all be in English, creating a barrier to participation.

A central JADE wiki poses additional social challenges:


 * We would be creating a new wiki. Generally speaking, once you create a new wiki, it is very hard to get rid of it.
 * This new wiki would need to be patrolled for vandalism and spam and need to support a community large enough to carry out this work.
 * Conflicts that emerge in this new environment would have no clear form of resolution, since this work would occur outside of the "walls" of the community.
 * We would need to create processes to promote JADE-wiki admins from these local communities to ensure that libelous or otherwise forbidden content can be suppressed in a reasonable amount of time.
 * New communities take a long time to grow mature, or may fail to do so.  It could easily take two years of work to build a self-sufficient community.
 * In addition to the overhead of running a separate wiki, we would need to create some bridge between that wiki, its community, and the other wikis and communities that are contributing work to be stored in this central wiki. The social and technical overhead required poses a significant risk for product adoption.
 * There would be boundary issues. Content, representing judgments of edits and actions on an individual wiki made by editors of those wiki, would be mixed in with data from other wikis and other communities; meanwhile, yet another community may form around this JADE wiki. This has the potential to build significant conflict. Social conventions would need to be established around how to best use such a shared space. Wikidata and Commons historically demonstrate these conflicts.
 * Even with software features that integrate the JADE user experience into Wikimedia projects, there may still be a perception that JADE content is "external" and therefore cannot be trusted. Compare this to Wikidata-skepticism that exists on some Wikimedia projects.
 * Admins on local wikis won't have admin-level access on the JADE wiki. We would need to actively promote admins from various communities and hope that they can coordinate together.
 * We would want to obtain local wiki consensus before allowing targeting of entities on any local wiki, which sounds like a tricky cross-wiki consensus process, and may require some technical means to prevent edits that target non-consenting wikis.

Our understanding from Site Reliability Engineering is that a standalone wiki would address the performance concerns that were raised. However, if there is a need to decommission the JADE wiki for whatever reason, we would need to remove existing software integrations and then lock the wiki for editing, eventually taking it down.

Other ideas

 * Multi-content revisions: Currently, we're choosing to avoid MCR because: MCR is not ready to deploy and is blocked on capabilities we require, such as integration with AbuseFilter; using MCR wouldn't have any impact on the revision and page metadata size (all we're doing is shifting content to a new table, but text table data length is not one of the concerns here); and judgments will be added after the fact, so revision judgments will be separated from the revision being judged and we'll have to maintain a separate index in order to find judgments. Otherwise, what would make MCR a good fit is that there are a finite number of judgment entity types and schemas, so mapping either types or schemas to slots would be intuitive and efficient.
 * Custom tables: The X1 experimental cluster hosts miscellaneous production databases. We would be required to develop a custom database schema with no possibility of replication to Cloud Services' Wiki Replicas or ability to perform join operations against other production tables. We would need to reinvent the fundamentals of editing, including recent changes integration, abuse filters, edit hooks, and so on.

FAQ
For more information see JADE/Scalability FAQ.

Prototype
We'll create a working API and deploy to the Beta Cluster. This release is for developers only, and can be used to test client implementations. It will not be able to produce any content visible from Wikimedia sites. See the Phabricator task for a detailed specification, and the list of out-of-scope features.

Minimum Viable Product (MVP)
This is our first release that is capable of being written to and read from Wikimedia sites.