Jump to content

Talk:JADE/Implementations

Add topic
From mediawiki.org
Latest comment: 7 years ago by Adamw in topic Using MCR

Single source truth

[edit]

I drew a picture of what I think our event flow should look like. See this article about managing event-based systems for reference.

JADE event flow. Events are generated by human actions. These events are the primary truth and go directly towards being stored and being applied to JADE's own state store. An "executor" turns events into actions and a "translator" converts events to a new format.
JADE event flow. Events are generated by human actions. These events are the primary truth and go directly towards being stored and being applied to JADE's own state store. An "executor" turns events into actions and a "translator" converts events to a new format.

EpochFail (talk) 18:42, 12 October 2017 (UTC)Reply

Missing from this is MediaWiki integration that will include the minting of RecentChanges rows and a minimal UI for suppression actions. The MediaWiki integration will likely also include the ability to submit judgments/endorsements. EpochFail (talk) 21:55, 13 October 2017 (UTC)Reply
I've been thinking about this. It's going to be hard to emit fully-formed events without direct interaction with the state store. The most difficult part is generating IDs for events that create something. I was thinking that we could get around this by having the JADE system emit a proto-event that corresponds to strict protocol of "everything but IDs" and that the system *only* uses these proto-events to update its own internal state.
JADE state validator. The process of generating events is expanded to include a flow of proto-events and responses from a database that then results in emitting fully-formed events to the event log.
JADE state validator. The process of generating events is expanded to include a flow of proto-events and responses from a database that then results in emitting fully-formed events to the event log.
I made this diagram to capture what I'm thinking. The cool thing about this is that we can have the proto-event actually correspond to a state update action that might fail for good reasons while updating state. If that happens we can then rollback state and not emit the event at all the the event log because it, in effect, didn't happen. EpochFail (talk) 15:27, 30 October 2017 (UTC)Reply
I just got talking to Aotto and Awight.
We identified a hole in this diagram where an event affects the state but doesn't get written to the log. We could move the event log into postgres (same as state) but then we wouldn't be absolutely sure an event made it out to the kafka bus. We talked about a few options.
1. Find a Change-data-capture system to connect postgres to kafka. Wrap writing in a transaction that would confirm that the event makes it to kafka if the transaction is committed.
2. Don't worry about it. Just expect that the event is written 99.9% of the time.
3. Send all events to Kafka and read valid state from Kafka. Expect that state is up to date 99.9% of the time while validating new events.
I like 1 because it could potentially get us all the things we want. I'm not sure how a CDC system can enforce the guarantees we want, but it *would* make me happy if it could. EpochFail (talk) 21:30, 30 October 2017 (UTC)Reply
I've been looking for a way out of the distributed transaction rabbithole, please take a look and see if this might be a solution:
The biggest sleight of hand is that `create_judgment` no longer requires a validity check on the page being judged, nor does it return an ID. The API sends an asynchronous event to create the judgment, and we simply return the <wiki>:Jade:<article> URL where we assume the judgment will eventually be created. Adamw (talk) 15:44, 9 January 2018 (UTC)Reply

Direct integration with MediaWiki user identities

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I've been thinking about MediaWiki integration. We don't want to require every 3rd party tool to do an OAuth dance -- especially if a user is already logged in. I was talking to Ladsgroup about how we could borrow a user's logged-in status during MediaWiki integration and he suggested that we use a CSRF token.

This would make it so that we don't need to require users to do an OAuth dance every time they want to use JADE (submit a judgement, suppress a comment, etc.). EpochFail (talk) 16:58, 14 October 2017 (UTC)Reply

I've been doing some digging around and I think that JSON web tokens offer a good solution here. Essentially, they wrap up an encoded payload that can be decoded and signature-checked. The security model requires a shared key but does not require that key to be sent with any requests. I'm working on an example for the JADE repo right now. Will have a pull request for that soon. EpochFail (talk) 20:33, 18 October 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Action --> Event

[edit]

A user sends a request to the JADE API to perform an action (e.g. save a judgement or suppress a comment). What happens? I think it'll roughly always be a sequence of operations.

  1. Check that the request is valid (e.g. CSRF token, OAuth status of user, etc.)
  2. Check that user has rights to perform action (and is not blocked)
  3. Check action's constraints (e.g. can't set the preference of a judgment that doesn't exist)
  4. Convert action into event and emit event to single source
  5. Process event in order to update state

Every action will do a similar dance. Step 1 will be common across all actions. Step 2 will be common for groups of actions (all non-banned users, oversighters, etc.). Steps 3, 4, and 5 will be unique to an action. 5 will happen "outside" of the API. EpochFail (talk) 17:10, 14 October 2017 (UTC)Reply

EpochFail (talk) 17:59, 14 October 2017 (UTC)Reply

Notes about i18n

[edit]

The schema blobs will have to include message keys for:

We can borrow a lot from: https://github.com/wiki-ai/wikilabels-wmflabs-deploy/blob/master/forms/i18n/damaging_and_goodfaith/en.json EpochFail (talk) 22:30, 7 December 2017 (UTC)Reply
Where should the i18n messages be supplied? Translatewiki makes it possible to host them anywhere. Being able to edit and override onwiki is nice, but not necessary. Adamw (talk) 22:37, 7 December 2017 (UTC)Reply
I'm a fan of translatewiki. I think it make sense. That means we'll need to store i18n in a repository (as opposed to a database). New messages will go live with deployments. EpochFail (talk) 23:10, 7 December 2017 (UTC)Reply

Is a judgment with multiple schemas divisible or not?

[edit]

For example, if a judgment includes both "damaging: true" and "goodfaith: true", and a comment, should we keep that information together where ever it's used? I feel like the comment is pertinent to both labels and it wouldn't make sense to either humans or machines to look at only one label and the comment.

I was thinking that this means we shouldn't encourage support/oppose votes, but now I see that users might still want to do that, since they can simply oppose a judgment if they disagree with one of the schema-scores, and make a new proposal with their alternative scores. Adamw (talk) 22:33, 7 December 2017 (UTC)Reply

It makes sense to machines to look at just one label. That's how ORES handles "damaging" and "goodfaith" now. I think that the "edit"/"revision"/"users"/etc. views offer a nice way to group these kind of things together. In many contexts, we'll likely receive a judgement that does not account for all schemas in a view. (E.g. maybe huggle users only note whether or not something is "damaging", but don't want to comment on "goodfaith" status).
It seems like you are advocating that the schema is not divisible. That would mean we wouldn't be able to add anything to a view (e.g. adding "edit_type" to the "edit" view along side "damaging" and "goodfaith") because the comment would not apply to "edit_type" and that would make the data itself invalid.
Re. "support/oppose votes", I agree that it makes sense to be able to express a situation like "Everyone agrees that this is damaging, but there's disagreement about whether or not the edit was saved in goodfaith. Still the current consensus is that it was saved in goodfaith." EpochFail (talk) 20:35, 8 December 2017 (UTC)Reply

Are judgments editable?

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Almost everything on-wiki can be edited by anyone. Do we let people edit one another's judgments? What would that even mean, data-wise?

I've been thinking that judgments can be amended by their creator, but what that does is rank=deprecated the old judgment and creates a new one. I want that data either way, but we can save the old history into a secondary store that cares about slow-changing dimension if needed. Adamw (talk) 22:53, 7 December 2017 (UTC)Reply

That's a good question. I think only admins should be able to "make edits" by hiding endorsements. This is roughly similar to the pattern of revisions and suppression.
+1 for saving the data in a secondary store. I think it's a good idea to have this secondary store anyway. Any real contribution interface for JADE will want that history, but JADE itself only needs to know the current state to validate an action/event. EpochFail (talk) 21:50, 11 December 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Prototype vs. MVP

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I refreshed my understanding of these two terms, and I think we should change our language to make it clear that the initial phase of our project is the prototype rather than an MVP. My theory is that there's an important semantic difference, which should help to manage expectations among our stakeholders.

Here are the Wikipedia articles and some quotes,

> A prototype typically simulates only a few aspects of, and may be completely different from, the final product.

>  The purpose of a prototype is to allow users of the software to evaluate developers' proposals for the design of the eventual product by actually trying them out, rather than having to interpret and evaluate the design based on descriptions.

> A vertical prototype is a more complete elaboration of a single subsystem or function. [This is us, IMO.]

> The main goal when using evolutionary prototyping is to build a very robust prototype in a structured manner and constantly refine it. The reason for this approach is that the evolutionary prototype, when built, forms the heart of the new system, and the improvements and further requirements will then be built. [This is also us.]

> A minimum viable product has just those core features sufficient to deploy the product, and no more. Developers typically deploy the product to a subset of possible customers—such as early adopters thought to be more forgiving, more likely to give feedback, and able to grasp a product vision from an early prototype or marketing information.

> "The minimum viable product is that version of a new product a team uses to collect the maximum amount of validated learning about customers with the least effort." -Eric Ries

-----

What I'm seeing is that the features we're leaving out (see T176333), especially curation, are part of what would make the product deployable. When we first release to actual Wikimedia projects, that will be our MVP. Adamw (talk) 14:23, 20 December 2017 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Multi-Content Revisions

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


If we're going to cram JADE into MediaWiki (in order to better support curation and suppression) I think it might make sense to plan to take advantage of the new work on Multi-Content Revisions. I've made an example of what I think MCR JADE stuff would look like. See JADE/MCR_example. What do you think? EpochFail (talk) 17:02, 22 December 2017 (UTC)Reply

I like that we can separate Edit vs Revision judgments. It doesn't seem to solve any of our other dilemmas, unfortunately. There's only one slot per content type, per revision, so we're still looking at some weird denormalized set of judgments being crammed into a single content slot, AFAICT. It'll be challenging to do per-user suppression in that case, since suppressing a single judgment will probably require custom code to rewrite the set of judgments. Adamw (talk) 16:44, 2 January 2018 (UTC)Reply
In my example, each content type gets a different slot and is transcluded onto one page. I was imagining that each specific judgement would gets its own slot. EpochFail (talk) 17:04, 2 January 2018 (UTC)Reply
I missed the transclusion aspect, thanks for pointing it out.
The way I'm reading Multi-Content Revisions/Content Meta-Data, there are a finite number of slots that are defined ahead of time, and each is named. On the other hand, I found this "maybe" feature which could be adapted to fit your scheme, Requests for comment/Multi-Content Revisions#Sub-slots. We'll have to ask @Duesentrieb or another MCR developer to clarify whether "sub-slots" will be supported, and if we can use a custom convention to enumerate sub-slots. Adamw (talk) 19:59, 2 January 2018 (UTC)Reply
Given your read of the docs, does it seem that "defined ahead of time" means "adding slots is a pain" or "adding slots is impossible"? EpochFail (talk) 22:22, 2 January 2018 (UTC)Reply
This answers some of our questions: Multi-Content Revisions/Database Schema
There are finite, smallint number of, "slot roles", I'm sure we're allowed to define a few for our purposes, but we can't go around creating them dynamically. (revision, role) is the primary index, so we can't provide more than one content per slot.
I haven't found any answers about subslots, thinking they weren't implemented yet :-/ Adamw (talk) 02:41, 3 January 2018 (UTC)Reply
I think small N is how many slots we'll really need. E.g. we have "damaging", "goodfaith", and "edittype" for diffs and "wp10", "drafttopic", and "draftquality" for revisions. EpochFail (talk) 14:46, 3 January 2018 (UTC)Reply
Right, I'm pretty sure we can define finite slots for each type of score or whatever, but we can't store multiple judgments unless they're sharing these slots. Adamw (talk) 15:05, 3 January 2018 (UTC)Reply
Right. I don't think storing multiple judgements makes sense in this context. Instead, we'd just have a history of "current judgement". EpochFail (talk) 15:13, 3 January 2018 (UTC)Reply
This MCR proposal is growing on me. Just to make some details of your MCR_example more explicit, to see if we agree:
  • Each wiki entity (for now, revisions) may have a JADE namespace page which will contain the consensus judgment about that entity.
  • The JADE namespace will have a custom content handler that pulls together MCR slots and renders as a single page.
  • Each Jade: page has a dozen MCR slots available, one per ORES judgment schema.
  • When updating a judgment, the author will sometimes be overwriting a score in an MCR slot. Any justification will be stored in the edit summary.
My concerns:
  • Is MCR suppression going to be ready for prime-time, or will we be tied to something that makes editors unhappy?
  • This makes it slightly awkward to have lots of judgment scoring schemas, e.g. maintaining both the old and new versions of a schema. IMO we should consider whether we can safely migrate scores to newer schemas, and only keep the latest schema in MediaWiki. Adamw (talk) 16:09, 3 January 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Suppression of the target wiki entity

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I'm wondering what we should do if the wiki revision is suppressed, or an article deleted? If it's a matter of title suppression, we would want to automatically purge the corresponding title from the Jade and Jade_talk namespaces. Hopefully there's a precedent for the talk namespaces, that we can follow.

If it's a simple revision suppression, I'm not sure if there's any harm in keeping judgments about that revision. These judgments may have been involved in the suppression decision. Adamw (talk) 15:50, 9 January 2018 (UTC)Reply

Generally, I think that judgements should stand regardless. In some cases, the judgements are extra important for deleted/suppressed content (e.g. vandalism, attack pages, spam, copyright violations, etc.) EpochFail (talk) 19:44, 17 January 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Using MCR

[edit]

MCR seems like a good fit for page-level judgments (and perhaps for users, too). Making it also work for revision level judgments or diffs would perhaps be possible, but a bit of a stretch.

But I see no reason that page level judgments couldn't use MCR to directly integrate with the page, while revision level stuff is stored elsewhere, be it as a separate page, or using an entirely different mechanism. The same ContentHandler could be used for all interaction with the content. Duesentrieb 21:50, 22 August 2018 (UTC)Reply

I wish that MediaWiki already had appropriate paths and storage for both human-generated (editable, versioned) and machine-generated (replaceable, maybe transient) metadata. MCR is a nice step towards that goal, and I can see why it's appealing to use it for... what it's intended for!
I'll sit with the idea for a while and try to come up with a thorough exploration, but so far a few issues jump out:
  • Bad idea to have a completely different storage path for one special case, because it complicates all JADE logic.
  • Clients will either have to face this same complexity (e.g. when making an edit to raw judgment page content), or we'll hide it in an API. That's fine, but there's no need for an extra API until the MCR twist is introduced.
  • JADE events can be emitted by watching the Judgment namespace, but with the page-judgment MCR proposal we would have to add a feature to changeprop to match on slot. Also okay, but it seems like a waste.
  • We already know that nothing else will fit into this mold, so it's really a one-off while the rest of JADE will grow to include additional entity types that don't benefit from MCR.
What would make me happy is if MCR grew to become appropriate for all JADE types, then it would be an obvious win. Adamw (talk) 22:31, 22 August 2018 (UTC)Reply
I'm warming to the idea of using MCR inside of the Judgment namespace, doing something like Judgment:Diff/123, with slots damaging and goodfaith. It's blocked on general stuff like T199121, T189220, and maybe adoption questions, but these will all be closed in a matter of time. What I like the most is that slots allow us to separate data from the dissimilar schemas, while keeping a single talk page for all judgments on an entity. We could have alternatively accomplished the first part by further normalizing by title, i.e Judgment:Diff/123/Damaging, but then the talk pages are diluted between "Damaging" and "Good faith" although this should clearly be a single conversation in simple cases.
Other characteristics of MCR slots are a great fit as well: serving the latest slot-revision content as current data, having history grouped by schema, being a small and unchanging set of schemas, even the separate ContentHandlers is intriguing.
One thing you could help me understand is how I might pull together data from all slots into a single page view in this scenario. I'm sure other MCR integrations have needed something equivalent, would you point me to an example? (Maybe I should follow the life and times of WikibaseMediaInfo's MediaInfoView class? I'm lost at `view-factory-callback`...)
Also, does it make sense that I wouldn't keep anything in the main slot? Could I even disable the main slot until we find a reason to define its semantics? Adamw (talk) 01:34, 23 August 2018 (UTC)Reply

(Maybe I should follow the life and times of WikibaseMediaInfo's MediaInfoView class? I'm lost at `view-factory-callback`...)

Never mind about that, I see it's for Wikibase presentation and not relevant to my question. Adamw (talk) 01:44, 23 August 2018 (UTC)Reply