Topic on Talk:JADE

Free text comments and suppression

31
Summary by EpochFail

See task T183276 "Design curation/suppression integration with MediaWiki (for JADE)"

EpochFail (talkcontribs)

There's one thing that's very clear from reading through ORES' mistake reports online: People want to include freetext comments with their judgements.

However, there's a problem where any time you open up a new freetext field on the internet, someone's going to use it to Dox someone else or otherwise cause harm. So we're going to need some mechanism for curation and suppression. There are two options we're kicking around.

First class suppression support

Text fields would be stored within JADE and a full suite of suppression tools would be made available. Integrations with MediaWiki would allow patrollers to see JADE comments appear in the RecentChanges feed on their local wiki.

Upside
Text is part of the system and any analysis users will want to do.
Downside
Way more work to implement and easy to get wrong.

Outsource to Flow

Flow already integrates with MediaWiki's RecentChanges feed and has the capacity for suppression events. If we store freetext comments in a Flow post, then we don't need to re-implement the same suppression mechanisms that Flow already has.

Upside
Much easier to implement
Downside
Analyses will need to join to flow tables (??) to search text comments. A flow post is also a little overboard for our goals.

See also en:User:Risker/Risker's_checklist_for_content-creation_extensions

EpochFail (talkcontribs)

I was just re-reading Risker's checklist. If we want first class suppression support, it looks like we'll want to have some direct MediaWiki integration so that judgements/endorsements show up in RecentChanges for the relevant wiki.

We should be able to also accept suppression actions from within that wiki. E.g. a user clicks on "delete" or "suppress" and the JADE API is hit with a request, the user's credentials/rights are checked, and if the check passes, JADE's API responds positively and the integrated MediaWiki UI signals success. None of this sounds too crazy to me.

Adamw (talkcontribs)

Implementing a first-class suppression system is a huge undertaking. I'd like to see how many times this has been done successfully. As far as I understand, this means that our JADE text will have to be a first-class wiki entity. Being text, there are already two ways to store the data that will enable existing suppression mechanisms: article or structured discussion post. I think that our only realistic options are:

  • Store the comments as one of these two types of entity, and allow JADE users to access as that native entity type publicly, i.e. edit using VisualEditor and linking to the "page" logs.
  • Store as an existing entity, but hide the fact that we've done this and only allow access through JADE APIs. Admins would be able to access as the native MediaWiki entity type.

The first option makes more sense to me. Either way, we do provide a thin layer on top of the native entities, to implement our basic API actions. If we go the latter route, I expect we'll eventually end up having to implement the entire suite of MediaWiki actions in JADE.

I agree about the requirement to make the text available for analysis. Maybe our schema includes both an URL to the native entity (e.g. Flow topic or post), and a primary key into the table used to store the text.

Here's some fun reading about search for Flow: https://phabricator.wikimedia.org/T104631

EpochFail (talkcontribs)

I don't think an page or a discussion post make sense as a comment/summary. I don't see the difficulties involved in implementing a curation/suppression system if we can leverage recentchanges (and why not). Further there are other things to suppress -- not just comments (like usernames and a judgement content).

It seems to me that, given risker's recommendations, we can integrate with recentchanges be mostly done.

P.S., the question about Search for flow was in the "Flow integration" thread.

Alsee (talkcontribs)

A text content system is big and heavily interconnected wheel that should not be reinvented. Discussion and similar content should be on wikipages any time it's possible.

It's not just recent changes. Will everything show up in user contribution history? Will a blocked user be able to take any actions / submit any content? Are you building a history for it? Non-public logs and tables for CheckUser and suppression? Will it run through abuse filter? Is anything going to fail or blow up if a username has to be suppressed or a user account is renamed, or whatever other corner cases exist?

Risker (talkcontribs)

I worry that the "variant" you are proposing is not all that far off from what we saw on other alternative editing systems. These were extremely problematic, in that they didn't show up in most of the logs, it wasn't real suppression but a jury-rigged system that inaccurately mimicked it. And perhaps most importantly, the volume of work was overwhelming to oversighters. Keep in mind that enwiki has roughly 8-10x the number of oversighters of any other project (including many of the largest ones), and we have enough work right now, with about 350-500 suppressions a month. In the past, some extensions (since disabled) were running us up over 1000 suppressions a month, mostly because nobody really understood the potential for inappropriate comments. It wasn't that long ago that we had to have a whole extension/namespace shut down because it was just a nightmare waiting to happen. People who've been suppressing material for a couple of years can tell you horror stories about the imaginative ways that people can manage to include inappropriate/private/potentially libelous/truly scary material when given the chance. (Example: credible bomb threat against a head of state visiting Canada - I had a lovely chat with the RCMP about that...) Sure, JADE should only be used by experienced users - but it may be difficult to prevent others from using it, including those with malicious intent. (And you'd be surprised at how often we wind up suppressing "accidental" edits by longtime users.) So I keep going back to...will it show up in the user's contribs, will it be showing up in recent changes, will it show up in the Checkuser tables, will the normal suppression tools work on it, will they show up in the suppression logs, etc. If it is easily accessible and uses the standard interface, we can probably live with it. Outside of that...things get complicated. Incidentally, suppressing on Flow is an absolute nightmare.

EpochFail (talkcontribs)
will it show up in the user's contribs, will it be showing up in recent changes, will it show up in the Checkuser tables, will the normal suppression tools work on it, will they show up in the suppression logs, etc.

Yes. That is part of the plan.

Ping JMatazzoni_ re. Flow's suppression nightmare. I don't think I understand what's bad about Flow's suppression system. Did they not do what you perscribed in the quoted block above?

Alsee (talkcontribs)

Why Flow? Most editors hate it. It's been uninstalled from EnWiki, uninstalled from Meta, and Commons currently has RFC with most editors wanting it uninstalled. The WMF ran a global survey which was canvassed/votestacked with as many Flow-enthusiasts as possible and it still came out heavily against Flow.

Adamw (talkcontribs)

Hi @Alsee, I'm reading through some of the debates you mention here. I found the survey, https://meta.wikimedia.org/wiki/Collaboration/Flow_satisfaction_survey/Report, but cannot find a related Commons RFC at https://commons.wikimedia.org/wiki/Commons:Requests_for_comment nor https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2015/08#Flow. Can you post a link?

We're open to looking at alternatives to Flow. Our technical requirements are simple, that the platform we choose will allow for discussion and curation, that we can link to a discussion thread via URL, and that a discussion thread can include an URL linking to the JADE judgments about the wiki entity being discussed. And we're not going to implement a new structured discussion platform.

Alsee (talkcontribs)
JMatazzoni (WMF) (talkcontribs)
Ping JMatazzoni re. Flow's suppression nightmare. I don't think I understand what's bad about Flow's suppression system. Did they not do what you perscribed in the quoted block above?

We are scheduled to make improvements on the Structured Discussion (formerly Flow) moderation system this year. I'm going to ping @Mattflaschen-WMF, who understands these issues much more fully than I. Matt, can you comment on the state of the SD suppression system?

Roan Kattouw (WMF) (talkcontribs)

I know some of the things that are lacking in SD's suppression system, but I'd be interested to hear why @Risker says "suppressing in Flow is an absolute nightmare". Not because I disagree, but because she's an experienced oversighter and it's highly likely I'd learn something from it.

Adamw (talkcontribs)
Mattflaschen-WMF (talkcontribs)

There are (different) limitations in both the standard deletion/suppression system and in SD's. Some of the SD limitations are major. Note, the following three links are all security bugs. Adam, Aaron, Risker, I have added you. However, please do not discuss them in public channels. https://phabricator.wikimedia.org/T103616 , https://phabricator.wikimedia.org/T94779 , https://phabricator.wikimedia.org/T116301 .

However, SD mostly solves all of the items Risker mentioned:

  • "will it show up in the user's contribs" - Yes (Special:Contributions), though not in the API yet (T88753)
  • "will it be showing up in recent changes" - Yes, though again not in the API (T88753)
  • "will it show up in the Checkuser tables" - Yes, SD has CheckUser support.
  • "will the normal suppression tools work on it" - There is deletion and suppression support, though it doesn't have full feature-parity.
  • "will they show up in the suppression logs" - Yes
Mattflaschen-WMF (talkcontribs)
Adamw (talkcontribs)

Thanks @Mattflaschen-WMF, this is a sobering look at how complex it is to support suppression. It also led me to Manual:RevisionDelete, which is clear and will help us.

It's a bit off-topic, but the RevisionDelete page reveals one more place where we should be coercing our data structures to map 1:1 to wiki concepts. We're already mapping judgment->page content, and judgment author->editor username/IP address, but we could also map freeform justification->edit summary. Edit summaries will finally be freed from the 255-char limitation as part of multi-content revision work (https://phabricator.wikimedia.org/T6715), so technically such a mapping would work just fine for us.

@Halfak (WMF) this also makes me wonder how well MCR will be integrated with suppression, or whether we'll be confronted with edge cases for the next few years... We should probably do the drudge work of defining all of the suppression scenarios in Cucumber or something, to make it easier for editors to review in the pre-implementation phase.

Mattflaschen-WMF (talkcontribs)

Does it make sense to edit a free-form justification (e.g. to clarify it, or even change the justification in a meaningful way)? If so, edit summaries might not be a good fit (summaries are not editable).

Or would that always be considered an entirely new judgement?

Adamw (talkcontribs)

Oh, of course... Thanks for noticing. Edit summaries should be editable :-)

Also, we'll want to have VE for justifications, and render any wiki markup.

Halfak (WMF) (talkcontribs)

IMO, I do think it would be great if edit summaries are editable, but I don't think that's a blocker for us. It'll certainly be familiar to Wikipedians to have non-editable summaries. If further justification is necessary, one could employ talk pages or structured discussion (flow).

Alsee (talkcontribs)

Some months ago there was a discussion somewhere (I don't recall where), where someone suggested/requested making edit-summaries editable. My recollection of the prevailing view was that it was undesirable due to messy extra layer of meta-history it would entail.

Halfak (WMF) (talkcontribs)

After doing a bit of research and talking to the folks on the MW Platform Team, I think it makes the most sense to try to map JADE's structure of Judgements into page content and build a content handler for it.

A: A collection of pages with a sequence of revisions where the most recent is the current revision. B: A page/revision like judgement schema. A collection of wiki entities has a sequence of judgements where the most recent is the current judgement. C: A collection of wiki entities that contain a set of judgements that have a set of endorsing contributors -- one of which is set to be "preferred" and therefore current.

This diagram is a little bit backwards. After reviewing false positive reports and considering consensus discussion patterns, I settled on shape C as a good way to represent "current good" as well as "disagreement". A is how I see the shape of page/revisions. B is me trying to cram JADE into the page/revision shape.

There are a few patterns from C that would need to be abstracted on top of page histories. E.g. changing the "preference" in shape C would be a revert in shape B. Signalling support for a judgement is possible in shape C but not in shape B. Rather, signalling support would need to be done in an unstructured way.

By seeking to formalize this via a page/revision structure, we also address a bit of the problem with *choosing* Flow vs. talk pages ahead of time. If we were to use the page/revision structure, we could set up a namespace for JADE and have the judgement for a entity_type/identifier exist as a content page. Either a talk page or a Flow board could be behind that content page. From JADE's point of view, I'm not sure it matters.

Mattflaschen-WMF (talkcontribs)

I think this is a good option. If you're just creating talk pages (old-style talk pages or StructuredDiscussion talk pages) in e.g. a Judgement_talk namespace, you have flexibility (you don't even need all talk pages in the namespace to be the same kind), and will benefit from existing work.

For the record, I do not agree with "I don't see the difficulties involved in implementing a curation/suppression system if we can leverage recentchanges (and why not)." I could go into more detail why, but it seems we're already on the same page.

Adamw (talkcontribs)

I have a two-part proposal for just the free-form text aspect of our MediaWiki integration:

  • We store judgment comments (justifications) as the edit summary on the revision where the judgment is added.
  • We store discussion as a single URL pointing to any type of talk thread, be it Wikitext or Structured Discussion.
Mattflaschen-WMF (talkcontribs)

@Adamw If there is an entirely separate talk page (e.g. Judgement_talk:123) for every judgement, this might not be an issue.

However, I wanted to note that for long-running old-style talk pages (e.g. Village_pump_(technical)), there is no way to permalink to an active thread. If you just link to a section, it will be archived away (e.g. to Wikipedia:Village_pump_(technical)/Archive_161). If you link to an oldid, you are not linking to the active version of a thread. You can only link to it reliably after it has been archived.

If the talk page is short, and will never be archived, this is not an issue.

SD solves this issue through the Topic namespace.

Adamw (talkcontribs)

My first point about using the edit summary is a bad idea, thanks for the feedback in the earlier thread.

Our working theory is that JADE discussions are not 1:1 with specific judgments, but are 1:1 with the wiki entity being discussed. Since a topic on the entity's talk page would be the most natural place to have such a discussion, I think your critique is correct, and the lack of permalinks might be fatal to using wikitext and the Talk namespace for discussions.

However, I've been thinking that we need a reverse link from the discussion pointing to the JADE context. It's wacky, but do you think we could use the "what links here" table to maintain a map from JADE-subject wiki entity to the discussions about judging it?

Mattflaschen-WMF (talkcontribs)

To answer the WhatLinksHere question, I think I need to understand the proposed setup better (e.g. to know if we're talking about cross-wiki). If it's within a single wiki, and judgements are stored in a restricted Judgement namespace (meaning there is no free-form text), it could work.

Are you envisioning something like (I'm assuming for all, the discussion would just be on the talk page of the mentioned page):

  1. Entity:Page:Earth on the regular wiki, e.g. en.wikipedia.org. And Entity:Revision:1234 ?
  2. Entity:Page:Earth on a special jade.wikimedia.org wiki
  3. Earth (the actual Earth article) on en.wikipedia.org, with the JADE discussion(s) on Talk:Earth with everything else. If so, what are you doing about revision entities?
  4. Something totally different

If I understand right, there would be one discussion page (e.g. Entity_talk:Page:Earth) for each entity, but there could be multiple topics within the page. Each judgement/justification could have its own topic then.

If more then one judgement shares the same topic, the reverse link(s) would have to go to multiple judgements.

If you're worried about people losing the context, another option is to embed the UI into JADE, so (regardless of where the discussion is technically stored), you see (and can even reply to) the discussion on the same UI page as the judgement. SD has robust APIs, so this is doable.

Adamw (talkcontribs)

Sorry that this is such a moving target! My current thoughts are,

  • Main:Earth on en.wikipedia.org is the article being judged.
  • Jade:Earth on en.wikipedia.org is the structured judgment, which has some kind of restrictions on editing, at least that you can't save if you broke the schema, and is normally edited via the JADE API and always through our custom ContentHandler.
  • Jade_talk:Earth is the free-form discussion about things on Jade:Earth.

I think that solves the context questions, since we can rely on the ordinary talk page relationship, and the Jade: namespace correlates to the Main: namespace in a similar way.

EpochFail (talkcontribs)

One minor note. We'll probably need to name that page something like: Jade:Page/Earth. Or better yet, Jade:Wikidata item/Q2 since judgments at the page level address the "concept" more than the content. Regardless, we'll need to save space for something like Jade:Revision/1234567 and Jade:Diff/1234567.

Adamw (talkcontribs)

Oops, that's a major oversight on my part! I think your suggestion of Jade:(Revision|Diff)/1234567 is the sanest way to go.

If we decide to get fancy, Jade:Page/Earth could use the JADE API to generate an index of all judgments on Earth revisions, but then we're straying into UI territory, which we probably don't want to do. Unless you had something else in mind for what this URL would contain?

EpochFail (talkcontribs)

Jade:Page/Earth would be great for labeling the concept space of "Earth". E.g. geography, astronomy, etc. We do this for the draft topic model. We could apply it to the first revision of the article, but it really represent the concept as a whole.

Adamw (talkcontribs)

I like it! It does feel a bit asymmetrical to use page name rather than ID, since the revision and diff paths will be canonical by ID, but better for humans. If MediaWiki allows it, maybe we can think about providing a few aliases to be more friendly, e.g. Jade:Revision/Earth/1234567. I keep wanting to put the article name first, i.e. "Jade:Earth/Revision/123456", but that falls apart if Earth happens to have a subpage "Revision"...

Reply to "Free text comments and suppression"