Topic on Talk:JADE

Thematic and quant analysis of judgements

4
Summary by Adamw

Judgments and comments should be available for analysis from Quarry, and joinable against wiki entities. Comments should be full-text searchable.

EpochFail (talkcontribs)

A common pattern when discussing ORES mistakes is to do a en:Grounded theory analysis (as seen here: it:Progetto:Patrolling/ORES) where misclassifications are grouped by what was going in the edit. This is a really cool and useful bit of work because it helps me understand the trends in a machine learning model's mistakes and work to address them.

I'd like to see JADE support this pattern. Obviously, people should be able to link to judgments on Wiki pages. But I think we can get more out of having judgments query-able via m:Quarry and other types of public analytics systems. So we should be able to use our event-based system to route judgement data to LabsDB (probably soon to be renamed to CloudDB or maybe ForgeDB) so that people can query it via Quarry's online interface.

I'd like to run queries like: "Give me all of ORES false positives for 'damaging' where the editor is anonymous and the page was longer than 30KB before they saved their edit." This would involve joining judgments with MediaWiki's internal databases.

Adamw (talkcontribs)

Yes! I think you solved this with EventBus, where we provide a Kafka consumer that ingests messages into a MediaWiki database.

This would mirror only the public data. For the type of analysis you're describing we probably want the full history of judgments, minus redactions. Seems like the discussion threads should be included as well, which is a scary challenge to our plan to offload this complexity. Maybe the MVP will just have aggregate discussion stats like number of posts, number of participants, and a link to the Flow thread?

EpochFail (talkcontribs)

Seems like we should store these basic stats in a table in MediaWiki so that people can query it, flow, etc. via quarry.

Adamw (talkcontribs)

Something that came up in IRC: Our current design materializes JADE data in MediaWiki MariaDB as JSON blobs under the JADE namespace. This is pretty unusable from Quarry, to my knowledge.

Reply to "Thematic and quant analysis of judgements"