About this board

Jeblad (talkcontribs)

This interface has the same core problems as many other such rating systems: (1) It introduce a way to heavy workload, (2) it is outside the normal workflow, and (3) it is not clear how it actually is meant to be used.

(1) Writing a manual reason is not efficient. Making an interface like this will only lead to users writing crappy gadgets to work around the limitation. It would be far better (faster) to have prepared reasons and that would also make it a lot easier to analyse the reasons later on. (The four prepared reasons does not really fit very well. Bad faith? How can you really know if something is done in bad faith? This is plain crystal ball!)

(2) Whether something like this is used depends on it being part of a workflow. This seems to be outside all workflows, and then I start wondering why should anyone use the page? The only process I know of that really needs something like this is Featured Articles, but I'm not sure there are enough willingness to adapt current processes.

(3) Core problem; should you describe a change or a revision? You identify and post comment on a revision, yet what is shown is a diff. What do you comment in this case? I have no idea. See also phab:T185247, you need a rationale for you classification, but then what do you classify?

Jeblad (talkcontribs)

A fourth problem with this is that it creates a new point of vandalism. This happens because of the free form judgement. By using prepared judgements it becomes harder to use for targeted vandalism.

One way to make vandalism harder is to only allow non-autoconfirmed users to use prepared judgements, while autoconfirmed users are allowed to use free form judgements. That sort of works. (That is; autoconfirm is used as the cost the user must bypass to be allowed to use the free form judgement.)

EpochFail (talkcontribs)

It is designed to not incur a new workload. Patrollers would add labels to Jade by going about their normal patrolling activity via integrations with the tools that they are using. This is a long-standing ask from patrollers and tool developers working on patrolling support.

The only time where there is new work to do is in the case where a disagreement occurs. Say for example, if one editor thinks an edit is damaging and another does not, they might need to talk to each other to work out their differences. I expect this to be rare as it is currently a rare occurrence.

If you think that we can't determine good-faith from bad-faith, I might agree in principal, but in practice, people send counter-vandalism warnings to people they find vandalizing Wikipedia (bad-faith) and they send kinder revert notices to people they find making mistakes that don't look like vandalism (good-faith). We have had over 30 wikis perform labeling work to differentiate good edits from bad-faith vandalism and good-faith damage and people seem to do a good job of making sense of this distinction. Further, patrolling tools have this distinction built in. E.g. Huggle has separate buttons and user-warning actions in these two different cases.

We differentiate revisions (versions of the article) from changes (diffs/edits) as different "entity" types. If you are labeling the Diff, you are labeling the change. If you are labeling the Revision, you are labeling that version of the article. Jade's entity visualization makes this clear by either loading the version of the page or the diff of the edit above the proposed labels.

Jade does create a new point where vandals will eventually try to cause problems. Your proposal for "prepared judgments" is interesting, but I don't see that it would be necessary before we see a problem. Surely, we can just prevent contributions in the Jade namespace by anons, new users, etc. after it becomes a problem. If it does become a problem, the same old revert mechanisms will still apply and changes to Jade entities will still appear in the RecentChanges feed -- ala en:User:Risker/Risker's_checklist_for_content-creation_extensions and Everything is a wiki page.

Jeblad (talkcontribs)

(1) This is not a patroller-only feature as the subject page explains, but a feature the patrollers might use. If it is an additional feature for the patrollers, then it will give rise to an additional workload. Even loading an additional page will incur an additional workload.

(2) Either you have a system where one user take charge, like present editing where last editor overrides all, or you have a system where some sort of consensus is built. This is not a consensus based tool, it is a last editor overrides all.

(3) No, you can't determine what is good faith and what is bad-faith. You can only observe the effects of the users editing. (Patrollers can't even agree on what should and should not be reverted, how can they agree on why some edit was made?) And no, continue doing something because it is tradition in some other tool is a fallacy.

(4) I would like to see an interface that is able to make that distinction, and convey it to the editors. I really do, but I doubt it is possible.

(5) leads to (1) but now you have patrolling of judgements from earlier patrolling. This adds up layer by layer. The approach is wrong, as it often is in this "wikiway"-thinking. You must make it easier to patrol and/or harder to vandalize. Making more advanced patrolling, for whatever reason, will make the content harder to maintain, which will lead to fewer users maintaining the actual content, which will lead to lower quality.

The question you should ask is: What is the physical features we can exploit to create a force-factor that works with the patroller to make it harder for a vandal?

EpochFail (talkcontribs)

(1) Oh yes. For sure. This tool captures information about the work that people are already doing. Right now, this information is either lost of hidden from view. When making this visible, there will be work involved in ensuring it doesn't fill up with garbage. The level of that workload has yet to be seen and we can respond to issues as they arise. In the meantime, we've designed this system to look and function like wiki pages so that the same old workflows apply.

(1.5) Maybe we should have separate threads rather than numbering things? This is getting hard to follow.

(2) Right. So all wikis page have the same problem then. This system mimics a strawpoll style discussion. In many wikis, there are rules and processes that avoid the "last editor override" problem that you describe. I would imagine those policies and processes to apply to jade too.

(3) I do think that people make judgments about this all of the time. This tool is intended to track human judgment, not truth. Differentiating what appears to be good-faith mistakes from what appears to be intentional vandalism is important work. I'm not saying it is tradition. I'm saying it happens, it's consistent, and we can train machines to help. I'm honestly not sure what the problem is. Jade is actually designed to support disagreements about what should or should not be considered "good-faith". I think we'll see some interesting debates come out of that for sure, but for the most part, I expect people to agree 99% of the time (or more) based on all of the edit labeling work people have already been doing to train ORES.

(4) The revision entity will be named "Jade:Revision/123456" and will include a visualization of the entire article. The edit entity will be names "Jade:Diff/123456" and will include a visualization of the diff that the edit creates. It'll be hard to mix the two up because you'll be looking at what you're judging while you're judging it.

(5) This will be patrolling in the same old way. It's certainly not more complicated and we're not adding new layers -- just another namespace to patrol. It'll have structured data so it'll be harder to vandalize, but vandalism will still be possible. The whole goal is to use this data to train, test, and govern a machine learning models (see mw:ORES) that makes patrolling easier. I see no good reason why we wouldn't also run ORES on Jade pages. In this way, you could say that making patrolling easier is Jade's primary purpose.

Jeblad (talkcontribs)

(1) It creates additional work, but in some cases it replaces existing work already done. I believe it will invite to more discussion and more work.

(2) This has left my original reasoning and question, but anyhow, it is not about consensus as it is now.

(3) What you try to do by classifying “good vs bad faith” is to add a pseudo-orthogonal dimension that contradicts the “destructive vs non-destructive” dimension, or in other words you add a variant uncertainty.

(3, your 4) Perhaps people will learn to distinguish them, but I believe it would take time. If people start discussing individual diffs the productive work could potentially grind to a near halt.

(your 5) If you use ORES to patrol JADE entries, then you should not use JADE entries to train ORES. You will then create a closed feedback loop, and over time you will get increased bias. This is due to some highly active (dominant) users which ORES will learn, and then ORES will facilitate those (dominant) users patrolling, which are then feed back into ORES. (You can avoid this by tracking “agents” aka users, and controlling individual feedback. [Agents acts like a feature vector.] Given that IP-users are pretty near random, it could be infeasible.)

EpochFail (talkcontribs)

Given what I have seen about auditing behavior so far, I'm estimating that less than 1% of judgments will attract any additional discussion. But I do admit that this additional discussion is more work. Ultimately, this additional work is needed to get a decision right so I see it as a gap that we *aren't* discussing these decisions now. We don't even know how often someone is reverting something and sending a warning message when it was actually a good edit.

When I classify good vs. bad-faith, I'm mimicking the structure that Huggle and Twinkle (and other tools) use to decide what warning message to send to people in the process of reverting their edits. People *are* already making this distinction. It is a real distinction. It certainly doesn't contradict the "damaging" dimension -- but rather it adds nuance. For example, "What type of damage?" is an important question for people like those who host the en:WP:Teahouse. They invite newcomers who are doing good-faith damaging things but would like to avoid newcomers who are intentionally vandalizing Wikipedia.

I don't think that discussing diffs is going to be more interesting and rewarding than contributing new, productive work. Surely editors can already discuss diffs (and they do. See en:WP:BRD ), but this hasn't lead to a big dip in productivity as far as I can tell. It's regrettable that we don't have structured information about what opinions were discussed and what conclusions were reached because that would be really useful.

I think you're misunderstanding what would happen when we use Jade to train ORES. In the case that we use ORES to track vandalism in Jade's namespace, we'd be looking at diffs of Jade entity pages. In the case that we use Jade to train/test ORES, we'd be looking at the judgment data itself. This is where the potential feedback loop breaks. We'll need independent judgments of those edits to the Jade namespace to train anything.

Honestly, when it comes back to feedback loops, I'm far more concerned about ORES predictions leading Jade judgments in a particular direction. *That* is a complete loop. Jade captures data about the editor who submits a judgment and about what they are looking at when they submit it (e.g. a UI that includes an ORES prediction vs. just looking at a bare version of Special:Diff). We need to do more work into exploring how to get good data out of Jade for training. Right now, we train our models using meta:Wiki labels -- a tool shows random samples of diffs to editors for labeling specifically (no prediction, minimal context about the edit) to break out of these loops. I plan to move towards using Jade data cautiously using science to study processes that produce data that is consistent -- where new reviewers are likely to agree with the consensus judgment.

Long term, it might be that Wiki labels continues to play a critical role in making sure we train/test ORES with good data. Regardless, Jade will be essential infrastructure for finding out if ORES is going off of the rails or to make sure that we can track and deal with subtle prediction bugs. AIs aren't cheap. In order to reap the efficiency they bring to quality control work, we must track their behavior. Jade is intended to be a powerful tool for keeping ORES in check. Right now, we're tracking ORES behavior sporadically. I'd like to track it consistently leveraging the judgments that people are already making. That's the ask I have gotten from patrollers and patrolling tool developers. Have you ever participated in an audit of ORES? See for a great example of an audit that helped us fix some subtle issues with the model.

Jeblad (talkcontribs)

I'm probably mistaken, but are JADE supposed to replace user feedback?

Whatever program act as a crystal ball will not change that it is a crystal ball.

Discussing diffs is mostly an enwiki phenomenon, and its existence on other projects are mostly limited to edit wars. I am afraid it will spread.

In a feedback loop you get the same result whether A B A, or B A B. You close the loop and the system will spiral out of control unless you use some form of countermeasures. You may not even see what happen before you loose control, ie observability vs controllability. (Claiming that you use “science” will not save you from the effects of a closed feedback loop.)

This time around I'm going to wait for real numbers before I promote the use of additional ORES-based tools. I want to see real numbers for the workload, both added and removed workload, before I promote anything.

That is my final on this, good luck!

EpochFail (talkcontribs)

Jade is supposed to supplement user feedback. It would not replace anything. In fact, Jade is an excellent mechanism to make a case that an individual user feedback corresponds to a larger trend.

In the feedback loop example, I'm saying that A --> B C --> A is not a complete loop. Ultimately, B and C do not directly correspond.

Jade is the counter-measure to prevent ORES from spiraling out of control. The application of scientific methods will help us detect feedback loops. E.g., we can take observations outside the context of ORES and compare them to how judgments work when ORES is present in order to learn the effect of ORES on producing feedback loops.

We won't have real numbers until Jade is adopted and used. I'm not asking you to promote anything. But thanks for your thoughts.

Reply to "Problems…"

Judgment Bold-Revert-Discuss vs. optional consensus

Adamw (talkcontribs)

@Halfak (WMF) and I identified a design assumption in IRC last week, which has been implicit in a few of the other threads here, especially in "Judgments, Endorsements, and Preference" below, but is worth surfacing as its own issue. I'd like to open this for more comments.

In the current design of JADE, there can only be one "correct" judgment per wiki entity. Most entities will simply have one editor making one judgment, but in the cases that multiple editors supply a judgment, the second, third, etc. editors will be overwriting each other's judgments as the "correct" answer. This follows the BOLD,_revert,_discuss_cycle on English Wikipedia. @Halfak (WMF) has pointed out that this is the best process for the problem domain, that we want to encourage consensus.

An alternative workflow would be something like the Wikidata model, in which multiple judgments on a wiki entity default to having equal "rank", and it would be optional to go through a consensus process. Any editor could mark one or more of the judgments as "preferred", with or without discussion, or the editors could leave all judgments as "normal" rank and we would consider the set of judgments to be unresolved and equally valid.

I think there are two questions here: what is the preferable workflow for the various use cases in which editors add JADE judgments, and what are the best requirements on the data in order to support algorithms and reasoning using the judgments?

Jeblad (talkcontribs)

Actually “…this is the best process for the problem domain, that we want to encourage consensus.” is wrong. It focus on the strongest contender of the content, or in other word, “the single user that gives a rat ass about the consensus” to quote an user at nowiki. I believe it is important to say that this approach does NOT gives the consensus, it is the meaning of the user that does not give in and continue fighting. I'm not sure that creates a viable solution, and it definitely does not create a welcoming and including environment.

EpochFail (talkcontribs)

I think this discussion of the "Wikidata" model misses that it has BRD as well. In this case, "rank" is part of the data model itself. There's still one true representation that is identified via a consensus process. But with "rank" a more accurate reflection of a messy reality is represented in the data model itself. If we wanted to represent messiness like this, it might be better captured in JADE's schema itself -- depending on the type of judgement.

E.g. if a user decides that one judgement should be ranked "preferred" while another should be ranked "normal", they are essentially marking one as best and another as "safe to ignore". You might show up and disagree with how they have ranked and revert them. Preferably, you'd then discuss your differences in ranking.

I think when it comes down to it, I want to know why having ranks makes any sense. What information does it capture that we need to capture here? In what case, do we want one judgement to be "preferred" when another is simply "normal"?

BTW, I'm Halfak (WMF) accidentally made this edit through my volunteer account.

Keegan (WMF) (talkcontribs)

Is there a compelling user story that would have a ranking/preference system? How likely is this to be encountered in the wild?

I think the simple BRD approach is fine, with the one "true" judgement that can be revised further by others. If I could think of an imperfect analogy, it'd be kind of like File History on Commons. The latest upload is always displayed as the canon version, but previous revisions are still visible/available. Does that make sense?

EpochFail (talkcontribs)

Good Q. Here's the thread where I originally brought up the proposal for "preference": Tzw0uv2bucrdprm4 It discusses a use-case.

I think that generally, we should allow people to disagree with consensus -- much like what happens in a straw poll.

Adamw (talkcontribs)

I can buy this argument. I was thinking it would be more simple to a user to "make a judgment, ignore what else is there as you wish" rather than "make a judgment, yours will be the new authoritative judgment and will overrule the others". But I know I haven't had my fill of wiki Kool-Aid yet ;-)

Keegan (WMF) (talkcontribs)

In that case, I still think limiting it to one preferred judgement is what we'd want to go for. People are welcome to discuss it, but splitting it off into rankings and separate judgements seems unnecessary IMHO.

Jeblad (talkcontribs)

There are several articles that discuss rating, and points out the need for a cost function. One article is Jøsang, Audun, Roslan Ismail, and Colin Boyd. “A Survey of Trust and Reputation Systems for Online Service Provision.” Decision Support Systems 43, no. 2 (March 2007): 618–44. Note that the article is mostly about trust and reputation given by open rating system, but it is very similar to trust and reputation for rating articles.

The conclusion is pretty simple; if the cost goes towards zero, then the system becomes easy to game, and then users will try to game the system. To avoid that you need some kind of small cost, typically that you have done something positively. For example that you must contribute (non-reverted) content to be able to rate (non-reverted) content, still you may be able to rate according to prepared reasons as they are hard to use for vandalism.

Reply to "Judgment Bold-Revert-Discuss vs. optional consensus"

Answers to a few questions per email

Jo-Jo Eumerus (talkcontribs)

Got a request to answer the following questions from @Harej (WMF):

  • Please read through and let us know: Is this page clear? Does it explain what JADE is and what it is used for? No opinion.
  • Regarding the wiki(s) you edit, how familiar are you with how anti-vandalism patrollers coordinate their activity? Might there be interest in a system for better coordination? I don't think that vandal patrol is coordinated, really. Only thing is that regular users report vandals that don't respond to warnings to administrators and these block/protect/delete as appropriate.
  • How concerned are you, personally, about the potential for AI to become biased and perpetuate bias on the Wikimedia projects? Would you be interested in helping us identify patterns of bias or inaccuracy in ORES? I'd be a bit wary of homophobia mainly, as "gay" is a common vandal term.
  • Do you know who else I should talk to? I'd be tempted to suggest User talk:Iridescent; they often have insightful stances on such projects.
Reply to "Answers to a few questions per email"

Wiki entity types: Splitting revision into "edit" and "version"

Summary by Adamw

Where we decide to use "page", "revision", and "diff" as the target wiki entities for judgment.

EpochFail (talkcontribs)

Hey folks. In order to attach judgments to wiki things, we needed to formalize the concept of "wiki thing". Originally, we came up with the name "wiki artifact" but it turns out that the EventBus people already came across this problem and decided to call them "wiki entities". So in the spirit of conformity, I think we should use "entity" and "wiki entity" from here forward.

OK onto types. Right now, the first thing we want to target is a revision. In the future, we'll likely want to target users, pages, and other wiki entity types.

I've recently realized that "revision" is ambiguous with regards to judgement. We've been using the term to refer to two types of things. In the case of ORES' article quality models, "revision" refers to a specific version of a page while for the edit quality models, "revision" actually refers to a diff (or more specifically an edit action). An edit has different metadata than a version of a page. E.g. an edit has a parent revision and that contains a lot of essential meaning.

So, I've been thinking that we should split "revision" into "version" and "edit" to remove the ambiguity. This is really cool because interfaces that are trying to automatically discover what to render next to a judgement will know that a "edit" judgement should be shown next to a diff and a "version" judgement should be shown next to a full rendering of a page.

These things are encoded into m:Wiki labels as "views". E.g. "PageAsOfRevision" and "DiffToPrevious".

Adamw (talkcontribs)

I like "edit".

There's something fishy about "version". I'd like it to be more obvious, like "article snapshot".

+1 "wiki entity", but only because it already exists :-)

EpochFail (talkcontribs)

hmm maybe re "version" we can stick with "revision" since it's common language in MediaWiki.

Thematic and quant analysis of judgements

Summary by Adamw

Judgments and comments should be available for analysis from Quarry, and joinable against wiki entities. Comments should be full-text searchable.

EpochFail (talkcontribs)

A common pattern when discussing ORES mistakes is to do a en:Grounded theory analysis (as seen here: it:Progetto:Patrolling/ORES) where misclassifications are grouped by what was going in the edit. This is a really cool and useful bit of work because it helps me understand the trends in a machine learning model's mistakes and work to address them.

I'd like to see JADE support this pattern. Obviously, people should be able to link to judgments on Wiki pages. But I think we can get more out of having judgments query-able via m:Quarry and other types of public analytics systems. So we should be able to use our event-based system to route judgement data to LabsDB (probably soon to be renamed to CloudDB or maybe ForgeDB) so that people can query it via Quarry's online interface.

I'd like to run queries like: "Give me all of ORES false positives for 'damaging' where the editor is anonymous and the page was longer than 30KB before they saved their edit." This would involve joining judgments with MediaWiki's internal databases.

Adamw (talkcontribs)

Yes! I think you solved this with EventBus, where we provide a Kafka consumer that ingests messages into a MediaWiki database.

This would mirror only the public data. For the type of analysis you're describing we probably want the full history of judgments, minus redactions. Seems like the discussion threads should be included as well, which is a scary challenge to our plan to offload this complexity. Maybe the MVP will just have aggregate discussion stats like number of posts, number of participants, and a link to the Flow thread?

EpochFail (talkcontribs)

Seems like we should store these basic stats in a table in MediaWiki so that people can query it, flow, etc. via quarry.

Adamw (talkcontribs)

Something that came up in IRC: Our current design materializes JADE data in MediaWiki MariaDB as JSON blobs under the JADE namespace. This is pretty unusable from Quarry, to my knowledge.

Reply to "Thematic and quant analysis of judgements"

Judgments, Endorsements, and Preference

EpochFail (talkcontribs)

I've been reviewing some of the pages of ORES' mistakes (e.g. it:Progetto:Patrolling/ORES and some of our review of anomalies in training data (e.g. Phab:T171497). I think we'll need a nice pattern by which multiple judgments can be submitted about a single wiki entity (revision, page, user, etc.), a conversation can take place and consensus recorded. In order to manage all of this in the JADE/Schema, we've split up the notion of a "judgement" from an "endorsement" and we've borrowed the concept of "preference" from Wikidata.

Let me illustrate with an example. Let's say that I want to judge the quality of en:Special:Diff/327645238. I'd create a new "judgement" with damaging=true and file an "endorsement" of the judgement with my comment "Removing cleanup tags without fixing the problems." Because (in this example) I was the first person to submit a judgement/endorsement, that judgement would be marked "preferred".

At a later date, Awight is reviewing some anomalies in our training set and comes across this judgement. He reviews the diff and comes to a different conclusion. So he creates a new "judgement" damaging=false and files an "endorsement" of that judgement with the comment "Good removal of cleanup tags". He's feeling pretty confident, so he moves the "preference" to his non-damaging judgement and starts a new discussion thread tied to this wiki entity titled "Cleanup was performed in past edits" where he points out that the cleanup had already occurred before the edit in question.

I get a notification that an entity I judged has a new discussion thread. After reviewing the edit again and seeing Awight's reasoning, I still disagree, but I don't feel strongly enough to dispute the new preferred judgement. I could change my endorsement and edit my comment to reflect Awight's, but I don't want to. So I leave my endorsement on damaging=true and leave the "preference" alone (so it's still pointing to damaging=false)

Does this make sense? Should it be possible to have two different "preferred" judgments or should we limit it to just one that captures consensus? Does it make sense to allow individuals to keep their endorsements where they like and set the preference bit separately? What do you think?

Adamw (talkcontribs)

I'd like to see the judgment and comment more tightly coupled, for a few reasons:

  • It's meaningful to distinguish between a comment left during judgment, a comment added later, and a comment left during judgment but edited later.
  • We should be encouraging tool authors to always provide the free-form input field during judgment, because we think it's a best practice and results in more thoughtful and higher-quality labeling. The easiest way to do this is to make it a field on "create judgment".
  • What is an endorsement without a judgment, if not a discussion thread? I'm interested in keeping discussion threads out of our model and keeping that in pure Flow-land.

Good question about multiple preferred rankings. I would follow Wikidata's lead, but their documentation is deliciously ambiguous about this. Looking at how their data is used in practice,, I'd guess most tools will query our data using an equivalent of "rank=best". If a preferred judgment exists, only return that. If several rank=normal judgments are present, return them all... What if we leave it to consensus to set zero, one, or multiple preferred judgments? We're already in a plural paradigm in which we're going to return a machine judgment along with a human judgment, might as well allow multiple human judgments e.g. in the case of "zero preferred judgments but multiple unranked human judgments".

EpochFail (talkcontribs)

"What is an endorsement without a judgment" such a thing should be impossible. I agree that discussions should stay in flow land. A comment with an endorsement serves another purpose. It supports an endorsement. It's a special kind of statement. In Wikipedia, !votes take the form where endorsements have comments that directly support the endorsement, but comments are made orthogonality. Given, discussion threads will spawn from Endorsements, but I believe this is an anti-pattern since these discussions get messy quickly and much work is done to re-summarize and recover.

Adamw (talkcontribs)

In other venues, we decided to drop endorsements for now. Instead, free-form discussion can happen on the talk page. Disagreements about judgments can also follow the bold-revert-delete workflow, where the judgment is simply replaced by a new, proposed judgment.

Reply to "Judgments, Endorsements, and Preference"

Free text comments and suppression

Summary by EpochFail

See task T183276 "Design curation/suppression integration with MediaWiki (for JADE)"

EpochFail (talkcontribs)

There's one thing that's very clear from reading through ORES' mistake reports online: People want to include freetext comments with their judgements.

However, there's a problem where any time you open up a new freetext field on the internet, someone's going to use it to Dox someone else or otherwise cause harm. So we're going to need some mechanism for curation and suppression. There are two options we're kicking around.

First class suppression support

Text fields would be stored within JADE and a full suite of suppression tools would be made available. Integrations with MediaWiki would allow patrollers to see JADE comments appear in the RecentChanges feed on their local wiki.

Text is part of the system and any analysis users will want to do.
Way more work to implement and easy to get wrong.

Outsource to Flow

Flow already integrates with MediaWiki's RecentChanges feed and has the capacity for suppression events. If we store freetext comments in a Flow post, then we don't need to re-implement the same suppression mechanisms that Flow already has.

Much easier to implement
Analyses will need to join to flow tables (??) to search text comments. A flow post is also a little overboard for our goals.

See also en:User:Risker/Risker's_checklist_for_content-creation_extensions

EpochFail (talkcontribs)

I was just re-reading Risker's checklist. If we want first class suppression support, it looks like we'll want to have some direct MediaWiki integration so that judgements/endorsements show up in RecentChanges for the relevant wiki.

We should be able to also accept suppression actions from within that wiki. E.g. a user clicks on "delete" or "suppress" and the JADE API is hit with a request, the user's credentials/rights are checked, and if the check passes, JADE's API responds positively and the integrated MediaWiki UI signals success. None of this sounds too crazy to me.

Adamw (talkcontribs)

Implementing a first-class suppression system is a huge undertaking. I'd like to see how many times this has been done successfully. As far as I understand, this means that our JADE text will have to be a first-class wiki entity. Being text, there are already two ways to store the data that will enable existing suppression mechanisms: article or structured discussion post. I think that our only realistic options are:

  • Store the comments as one of these two types of entity, and allow JADE users to access as that native entity type publicly, i.e. edit using VisualEditor and linking to the "page" logs.
  • Store as an existing entity, but hide the fact that we've done this and only allow access through JADE APIs. Admins would be able to access as the native MediaWiki entity type.

The first option makes more sense to me. Either way, we do provide a thin layer on top of the native entities, to implement our basic API actions. If we go the latter route, I expect we'll eventually end up having to implement the entire suite of MediaWiki actions in JADE.

I agree about the requirement to make the text available for analysis. Maybe our schema includes both an URL to the native entity (e.g. Flow topic or post), and a primary key into the table used to store the text.

Here's some fun reading about search for Flow:

EpochFail (talkcontribs)

I don't think an page or a discussion post make sense as a comment/summary. I don't see the difficulties involved in implementing a curation/suppression system if we can leverage recentchanges (and why not). Further there are other things to suppress -- not just comments (like usernames and a judgement content).

It seems to me that, given risker's recommendations, we can integrate with recentchanges be mostly done.

P.S., the question about Search for flow was in the "Flow integration" thread.

Alsee (talkcontribs)

A text content system is big and heavily interconnected wheel that should not be reinvented. Discussion and similar content should be on wikipages any time it's possible.

It's not just recent changes. Will everything show up in user contribution history? Will a blocked user be able to take any actions / submit any content? Are you building a history for it? Non-public logs and tables for CheckUser and suppression? Will it run through abuse filter? Is anything going to fail or blow up if a username has to be suppressed or a user account is renamed, or whatever other corner cases exist?

Risker (talkcontribs)

I worry that the "variant" you are proposing is not all that far off from what we saw on other alternative editing systems. These were extremely problematic, in that they didn't show up in most of the logs, it wasn't real suppression but a jury-rigged system that inaccurately mimicked it. And perhaps most importantly, the volume of work was overwhelming to oversighters. Keep in mind that enwiki has roughly 8-10x the number of oversighters of any other project (including many of the largest ones), and we have enough work right now, with about 350-500 suppressions a month. In the past, some extensions (since disabled) were running us up over 1000 suppressions a month, mostly because nobody really understood the potential for inappropriate comments. It wasn't that long ago that we had to have a whole extension/namespace shut down because it was just a nightmare waiting to happen. People who've been suppressing material for a couple of years can tell you horror stories about the imaginative ways that people can manage to include inappropriate/private/potentially libelous/truly scary material when given the chance. (Example: credible bomb threat against a head of state visiting Canada - I had a lovely chat with the RCMP about that...) Sure, JADE should only be used by experienced users - but it may be difficult to prevent others from using it, including those with malicious intent. (And you'd be surprised at how often we wind up suppressing "accidental" edits by longtime users.) So I keep going back to...will it show up in the user's contribs, will it be showing up in recent changes, will it show up in the Checkuser tables, will the normal suppression tools work on it, will they show up in the suppression logs, etc. If it is easily accessible and uses the standard interface, we can probably live with it. Outside of that...things get complicated. Incidentally, suppressing on Flow is an absolute nightmare.

EpochFail (talkcontribs)
will it show up in the user's contribs, will it be showing up in recent changes, will it show up in the Checkuser tables, will the normal suppression tools work on it, will they show up in the suppression logs, etc.

Yes. That is part of the plan.

Ping JMatazzoni_ re. Flow's suppression nightmare. I don't think I understand what's bad about Flow's suppression system. Did they not do what you perscribed in the quoted block above?

Alsee (talkcontribs)

Why Flow? Most editors hate it. It's been uninstalled from EnWiki, uninstalled from Meta, and Commons currently has RFC with most editors wanting it uninstalled. The WMF ran a global survey which was canvassed/votestacked with as many Flow-enthusiasts as possible and it still came out heavily against Flow.

Adamw (talkcontribs)

Hi @Alsee, I'm reading through some of the debates you mention here. I found the survey,, but cannot find a related Commons RFC at nor Can you post a link?

We're open to looking at alternatives to Flow. Our technical requirements are simple, that the platform we choose will allow for discussion and curation, that we can link to a discussion thread via URL, and that a discussion thread can include an URL linking to the JADE judgments about the wiki entity being discussed. And we're not going to implement a new structured discussion platform.

Alsee (talkcontribs)
JMatazzoni (WMF) (talkcontribs)
Ping JMatazzoni re. Flow's suppression nightmare. I don't think I understand what's bad about Flow's suppression system. Did they not do what you perscribed in the quoted block above?

We are scheduled to make improvements on the Structured Discussion (formerly Flow) moderation system this year. I'm going to ping @Mattflaschen-WMF, who understands these issues much more fully than I. Matt, can you comment on the state of the SD suppression system?

Roan Kattouw (WMF) (talkcontribs)

I know some of the things that are lacking in SD's suppression system, but I'd be interested to hear why @Risker says "suppressing in Flow is an absolute nightmare". Not because I disagree, but because she's an experienced oversighter and it's highly likely I'd learn something from it.

Adamw (talkcontribs)
Mattflaschen-WMF (talkcontribs)

There are (different) limitations in both the standard deletion/suppression system and in SD's. Some of the SD limitations are major. Note, the following three links are all security bugs. Adam, Aaron, Risker, I have added you. However, please do not discuss them in public channels. , , .

However, SD mostly solves all of the items Risker mentioned:

  • "will it show up in the user's contribs" - Yes (Special:Contributions), though not in the API yet (T88753)
  • "will it be showing up in recent changes" - Yes, though again not in the API (T88753)
  • "will it show up in the Checkuser tables" - Yes, SD has CheckUser support.
  • "will the normal suppression tools work on it" - There is deletion and suppression support, though it doesn't have full feature-parity.
  • "will they show up in the suppression logs" - Yes
Mattflaschen-WMF (talkcontribs)
Adamw (talkcontribs)

Thanks @Mattflaschen-WMF, this is a sobering look at how complex it is to support suppression. It also led me to Manual:RevisionDelete, which is clear and will help us.

It's a bit off-topic, but the RevisionDelete page reveals one more place where we should be coercing our data structures to map 1:1 to wiki concepts. We're already mapping judgment->page content, and judgment author->editor username/IP address, but we could also map freeform justification->edit summary. Edit summaries will finally be freed from the 255-char limitation as part of multi-content revision work (, so technically such a mapping would work just fine for us.

@Halfak (WMF) this also makes me wonder how well MCR will be integrated with suppression, or whether we'll be confronted with edge cases for the next few years... We should probably do the drudge work of defining all of the suppression scenarios in Cucumber or something, to make it easier for editors to review in the pre-implementation phase.

Mattflaschen-WMF (talkcontribs)

Does it make sense to edit a free-form justification (e.g. to clarify it, or even change the justification in a meaningful way)? If so, edit summaries might not be a good fit (summaries are not editable).

Or would that always be considered an entirely new judgement?

Adamw (talkcontribs)

Oh, of course... Thanks for noticing. Edit summaries should be editable :-)

Also, we'll want to have VE for justifications, and render any wiki markup.

Halfak (WMF) (talkcontribs)

IMO, I do think it would be great if edit summaries are editable, but I don't think that's a blocker for us. It'll certainly be familiar to Wikipedians to have non-editable summaries. If further justification is necessary, one could employ talk pages or structured discussion (flow).

Alsee (talkcontribs)

Some months ago there was a discussion somewhere (I don't recall where), where someone suggested/requested making edit-summaries editable. My recollection of the prevailing view was that it was undesirable due to messy extra layer of meta-history it would entail.

Halfak (WMF) (talkcontribs)

After doing a bit of research and talking to the folks on the MW Platform Team, I think it makes the most sense to try to map JADE's structure of Judgements into page content and build a content handler for it.

A: A collection of pages with a sequence of revisions where the most recent is the current revision. B: A page/revision like judgement schema. A collection of wiki entities has a sequence of judgements where the most recent is the current judgement. C: A collection of wiki entities that contain a set of judgements that have a set of endorsing contributors -- one of which is set to be "preferred" and therefore current.

This diagram is a little bit backwards. After reviewing false positive reports and considering consensus discussion patterns, I settled on shape C as a good way to represent "current good" as well as "disagreement". A is how I see the shape of page/revisions. B is me trying to cram JADE into the page/revision shape.

There are a few patterns from C that would need to be abstracted on top of page histories. E.g. changing the "preference" in shape C would be a revert in shape B. Signalling support for a judgement is possible in shape C but not in shape B. Rather, signalling support would need to be done in an unstructured way.

By seeking to formalize this via a page/revision structure, we also address a bit of the problem with *choosing* Flow vs. talk pages ahead of time. If we were to use the page/revision structure, we could set up a namespace for JADE and have the judgement for a entity_type/identifier exist as a content page. Either a talk page or a Flow board could be behind that content page. From JADE's point of view, I'm not sure it matters.

Mattflaschen-WMF (talkcontribs)

I think this is a good option. If you're just creating talk pages (old-style talk pages or StructuredDiscussion talk pages) in e.g. a Judgement_talk namespace, you have flexibility (you don't even need all talk pages in the namespace to be the same kind), and will benefit from existing work.

For the record, I do not agree with "I don't see the difficulties involved in implementing a curation/suppression system if we can leverage recentchanges (and why not)." I could go into more detail why, but it seems we're already on the same page.

Adamw (talkcontribs)

I have a two-part proposal for just the free-form text aspect of our MediaWiki integration:

  • We store judgment comments (justifications) as the edit summary on the revision where the judgment is added.
  • We store discussion as a single URL pointing to any type of talk thread, be it Wikitext or Structured Discussion.
Mattflaschen-WMF (talkcontribs)

@Adamw If there is an entirely separate talk page (e.g. Judgement_talk:123) for every judgement, this might not be an issue.

However, I wanted to note that for long-running old-style talk pages (e.g. Village_pump_(technical)), there is no way to permalink to an active thread. If you just link to a section, it will be archived away (e.g. to Wikipedia:Village_pump_(technical)/Archive_161). If you link to an oldid, you are not linking to the active version of a thread. You can only link to it reliably after it has been archived.

If the talk page is short, and will never be archived, this is not an issue.

SD solves this issue through the Topic namespace.

Adamw (talkcontribs)

My first point about using the edit summary is a bad idea, thanks for the feedback in the earlier thread.

Our working theory is that JADE discussions are not 1:1 with specific judgments, but are 1:1 with the wiki entity being discussed. Since a topic on the entity's talk page would be the most natural place to have such a discussion, I think your critique is correct, and the lack of permalinks might be fatal to using wikitext and the Talk namespace for discussions.

However, I've been thinking that we need a reverse link from the discussion pointing to the JADE context. It's wacky, but do you think we could use the "what links here" table to maintain a map from JADE-subject wiki entity to the discussions about judging it?

Mattflaschen-WMF (talkcontribs)

To answer the WhatLinksHere question, I think I need to understand the proposed setup better (e.g. to know if we're talking about cross-wiki). If it's within a single wiki, and judgements are stored in a restricted Judgement namespace (meaning there is no free-form text), it could work.

Are you envisioning something like (I'm assuming for all, the discussion would just be on the talk page of the mentioned page):

  1. Entity:Page:Earth on the regular wiki, e.g. And Entity:Revision:1234 ?
  2. Entity:Page:Earth on a special wiki
  3. Earth (the actual Earth article) on, with the JADE discussion(s) on Talk:Earth with everything else. If so, what are you doing about revision entities?
  4. Something totally different

If I understand right, there would be one discussion page (e.g. Entity_talk:Page:Earth) for each entity, but there could be multiple topics within the page. Each judgement/justification could have its own topic then.

If more then one judgement shares the same topic, the reverse link(s) would have to go to multiple judgements.

If you're worried about people losing the context, another option is to embed the UI into JADE, so (regardless of where the discussion is technically stored), you see (and can even reply to) the discussion on the same UI page as the judgement. SD has robust APIs, so this is doable.

Adamw (talkcontribs)

Sorry that this is such a moving target! My current thoughts are,

  • Main:Earth on is the article being judged.
  • Jade:Earth on is the structured judgment, which has some kind of restrictions on editing, at least that you can't save if you broke the schema, and is normally edited via the JADE API and always through our custom ContentHandler.
  • Jade_talk:Earth is the free-form discussion about things on Jade:Earth.

I think that solves the context questions, since we can rely on the ordinary talk page relationship, and the Jade: namespace correlates to the Main: namespace in a similar way.

EpochFail (talkcontribs)

One minor note. We'll probably need to name that page something like: Jade:Page/Earth. Or better yet, Jade:Wikidata item/Q2 since judgments at the page level address the "concept" more than the content. Regardless, we'll need to save space for something like Jade:Revision/1234567 and Jade:Diff/1234567.

Adamw (talkcontribs)

Oops, that's a major oversight on my part! I think your suggestion of Jade:(Revision|Diff)/1234567 is the sanest way to go.

If we decide to get fancy, Jade:Page/Earth could use the JADE API to generate an index of all judgments on Earth revisions, but then we're straying into UI territory, which we probably don't want to do. Unless you had something else in mind for what this URL would contain?

EpochFail (talkcontribs)

Jade:Page/Earth would be great for labeling the concept space of "Earth". E.g. geography, astronomy, etc. We do this for the draft topic model. We could apply it to the first revision of the article, but it really represent the concept as a whole.

Adamw (talkcontribs)

I like it! It does feel a bit asymmetrical to use page name rather than ID, since the revision and diff paths will be canonical by ID, but better for humans. If MediaWiki allows it, maybe we can think about providing a few aliases to be more friendly, e.g. Jade:Revision/Earth/1234567. I keep wanting to put the article name first, i.e. "Jade:Earth/Revision/123456", but that falls apart if Earth happens to have a subpage "Revision"...

Reply to "Free text comments and suppression"
Summary by Adamw

ORES needs to support more languages.  +1 for Swahili support.

Baba Tabita (talkcontribs)

My experience with ORES was not encouraging. I had applied for ORES to be initiated on sw:wp. At the preliminary step of creating a blacklist, the algorithm identified predominantly English words rather than unacceptable Swahili ones. And then, I had too many other (more?) worthwhile projects on my hands (and no other enthusiastic cooperators to go through stuff manually) ... Despite all the rhetoric about supporting small, local, indigenous, vernacular, minority, under-represented and/or endangered languages, internet technology is still largely anglocentric :(

EpochFail (talkcontribs)

Hi Baba Tabita. I'm sorry you seem to be having a bad time with ORES. I remember reaching out to you to ask for help with alternative means of getting blacklists for ORES to work with. We've had to do that for a few languages. As far as I know, we've been waiting on that. FWIW, our BWDS system picked up English Words for Swahili because much of the damaging edits in Swahili wiki add English language content. So that's not really a bug but rather a limitation in our process for auto-detecting badwords using wiki edits. It does however tell us that an English language dictionary would be useful for damage detection in Swahili wiki.

Regardless, I'm not sure that you could accuse of not matching our rhetoric to work with anyone who will work with us. I believe we have been quick to respond to your questions and to make suggestions about next steps. Also, FWIW, we're not focused on small, local languages. We're focused on supporting growing communities -- small language or not! See m:Community Engagement/Defining Emerging Communities.

In the end, there's only two of us who are staffing the team that does something Wikimedia has never done before so I hope you'll understand that we need your help in order to support your wiki. I have lots of other wikis and operational concerns to track and I put in a lot of volunteer time to make this experiment in community AI resources work at all.

Currently, we have support for 34 languages -- at least 11 of which are not heavily used in the en:western world -- so I think we're doing OK in most other instances.

Baba Tabita (talkcontribs)

Thanks for the timely reply!

And sorry if I came over as accusing or complaining, none of which was intended. I think you *are* doing a great job. It's just that circumstances are so much more favourable for European languages than, say, for African ones. I'm not a tech guy, so that adds to my frustration of not being able to do what I would like for Swahili localization in the little time available to me. Definitely not your fault! Still, just saying ...

Best wishes, and please keep up the good work!

EpochFail (talkcontribs)

Thanks Baba Tabita. Maybe I could look up some supposed bad word lists to help get us started. I'll ping on the ticket.

Should we integrate JADE with Structured Discussions?

Summary by Adamw

Judgments and discussions are rooted at wiki entities. We need more discussion before committing to using Structured Discussions, it's not the perfect fit. Having ORES and JADE entities available when reading or editing these threads would be great.

EpochFail (talkcontribs)

See Phab:T153147 for the relevant task.

Structured Discussion provides a well supported, wiki-integrated environment in which we can hold threaded discussions. In Judgments, Endorsements, and Preference, I described a JADE workflow that involved a "discussion post" between two users. I'm imagining that we can mint a Structured Discussion board as needed for discussions about judgments. These will help users use a central location to negotiate whether or not an edit was damaging or likely to have been saved in good-faith -- or if an article is C-class or B-class.

My two big questions are (1) can we mint a new Structured Discussion board for an arbitrary wiki entity and (2) will people find having a whole Structured Discussion board for discussing Judgments about a wiki entity intuitive?

Doug Weller (talkcontribs)

Put me down as someone who doesn't find Flow intuitive or user friendly.

EpochFail (talkcontribs)

All lab studies we have run have suggested that people find Structured Discussion much more user friendly and intuitive than, for example, talk pages. The big benefits seem to be auto-threading, "reply" buttons that work, and not needing to manage signatures.

I can certainly see how Structured Discussion doesn't have explicit functionality for many talk page hacks we've gotten used to, but in this case, I think we are actually targeting threaded discussion -- something that Structured Discussion does very well.

Doug Weller (talkcontribs)

My experience was that it was hard to search a Flow talk page, has that changed?

Adamw (talkcontribs)

@Ebernhardson brought up one consideration, that we need to attach the Flow board to an existing wiki page. Since Flow is only enabled on some wikis, it's probably impossible to have this page be on the wiki where the artifact lives. We would have to make a strange structure like, "", and maybe even have another directory level to group into hundreds of posts or so.

EpochFail (talkcontribs)

I'm not sure I understand the groupings, but I think that having all JADE discussions focused on a single wiki could work just fine. In the end, I imagine, we'll have topic Ids/URLs that people can be directed to through any UI that shows JADE stuff.

EpochFail (talkcontribs)

One more thing I was thinking... It would be really cool if our MediaWiki integration allowed judgement information to be rendered as part of a Flow topic/board page.

Adamw (talkcontribs)

At the very least, it has to be possible to navigate from a topic to the judgment.

Maybe, when a topic is first created, it contains an introductory template like "{{JADE/Judgment | jid = 123}}" which keeps the glue on-wiki and makes it easy to refine collaboratively. This could in turn be wrapped in a div, making it easy to find and hide when displaying the topic from places this info would be redundant, such as the JADE UI.

I'm suddenly doubtful about one of our assumptions: are discussion threads only useful on a judgment, or would we also want to attach to a score? Consensus discussion would make the most sense at the score- or wiki-entity-level, perhaps?

EpochFail (talkcontribs)

Scores change when we deploy new models. I feel like the proposals for attaching a judgement to a score do not account for this fact. E.g. false-positives turn to true-negatives as we refine a model. Further, we'll want judgments for scores that don't exist yet. So what would we do in that circumstance? I think it would make much more sense to attach judgements to wiki-entities and merely present the current score (if available) in the UI for reflection & discussion.

Adamw (talkcontribs)

+1 let's do that.

So then, discussions always begin with a judgment comment, or can they begin at a wiki entity?

EpochFail (talkcontribs)

Wiki entity IMO. We should have a nice way to attach a historical score to a discussion. E.g. {{subst:ores|revision|2131211|damaging}}

Reply to "Should we integrate JADE with Structured Discussions?"

FYI: An implementation discussion

EpochFail (talkcontribs)
Reply to "FYI: An implementation discussion"