Topic on Talk:ORES/Paper

EpochFail (talkcontribs)

This is largely adapted from Jmorgan's notes.

Wikipedia as a genre ecology. Unlike traditional mass-scale projects, Wikipedia's structure and processes are not centrally planned. Wikipedia's system functions as a heterogeneous assemblage of humans, practices, policies, and software. Wikipedia is an open system and its processes are dynamic, complex, and non-deterministic.

A theoretical framework that accounts for the totality of factors and their relationships is essential to building a system-level understanding of state and change processes. Genre ecologies[1] give us such a framework. A genre ecology consists of “an interrelated group of genres (artifact types and the interpretive habits that have developed around them) used to jointly mediate the activities that allow people to accomplish complex objectives.”[2].

Morgan & Zachry (2010) used genre ecologies to characterize the relationships between Wikipedia’s official policies and essays--unofficial rules, best practices, and editing advice documents that are created by editors in order to contextualize, clarify, and contradict policies. Their research demonstrated that on Wikipedia, essays and policies not only co-exist, but interact. The “proper” interpretation of Wikipedia’s official Civility policy[3] within a particular context is mediated by the guidance provided in the related essay No Angry Mastodons[4].

In genre ecology terms, performing the work of enforcing civil behavior on Wikipedia is mediated by a dynamic equilibrium between the guidance provided in the official policy and the guidance provided in any related essays, with the unofficial genres providing interpretive flexibility in the application of official rules to local circumstances as well as challenging and re-interpreting official ideologies and objectives.

Algorithmic systems clearly have a role in mediating the policy, values, and rules in social spaces as well[5]. When looking at Wikipedia's articulation work through the genre ecology lens, it's clear that robots mediate the meaning of policies (c.f., Sinebot's enforcement of the signature policy[6]) and human-computation software mediates the way that Wikipedia enacts quality controls (c.f., the Huggle's vision of quality in Wikipedia as separating good from bad[7]).

Wikipedia's problems in automated mediation Wikipedia has a long-standing historic problem with regards to how quality control is enacted. In 2006, when Wikipedia was growing exponentially, the volunteers who managed quality control processes were overwhelmed and they turned to software agents to help make their process more efficient[8]. But the software they developed and appropriate only focused on reifying quality standards and not on good community management practices[9]. The result was a sudden decline in the retention of new editors in Wikipedia and a threat to the core values of the project.

Past work has described these problems as systemic and related to dominant shared-understandings embedded in policies, processes, and software agents[10]. Quality control itself is a distributed cognition system that emerged based on community needs and volunteer priorities[11]. So, where does change come from in such a system -- where problematic assumptions have been embedded in the mediation of policy and the design of software for over a decade? Or maybe more generally, how does deep change take place in a genre ecology?

Making change is complicated by the distributed nature Since the publication of a seminal report about the declining retention in Wikipedia, knowledge that Wikipedia's quality control practices are problematic and at the heart of a existential problem for the project have become widespread. Several initiatives have been started that are intended to improve socialization practices (e.g. the Teahouse, a question and answer space for newcomers[12] and outreach efforts like Inspire Campaigns eliciting ideas from contributors on the margins of the community). Such initiatives can show substantial gains under controlled experimentation[13].

However, the process of quality control itself has remained largely unchanged. This assemblage of mindsets, policies, practices, and software prioritizes quality/efficiency and does so effectively (cite: Levee paper and Snuggle paper). To move beyond the current state of quality control, we need alternatives to the existing mode of seeing and acting within Wikipedia.

While it’s tempting to conclude that we just need to fix quality control, it’s not at all apparent what a better quality control would look like. Worse, even if we did, how does one cause systemic change in a distributed system like Wikipedia? Harding and Harraway’s concept of successors[14][15] gives us insight into how we might think about the development of new software/process/policy components. Past work has explored specifically developing a successor view that prioritizes the support of new editors in Wikipedia over the efficiency of quality control[16][17], but a single point rarely changes the direction of an entire conversation, so change is still elusive.

Given past efforts to improve the situation for newcomers[18] and the general interest among Wikipedia's quality control workers toward improving socialization[19], we know that there is general interest in balancing quality/efficiency and diversity/welcomingness more effectively. So where are these designers who incorporate this expanded set of values?  How to we help them bring forward their alternatives? How do we help them re-mediate Wikipedia’s policies and values through their lens? How do we support the development of more successors.

Expanding the margins of the ecology Successors come from the margin -- they represent non-dominant values and engage in the re-mediation of articulation. We believe that history suggests that such successors are a primary means to change in an open ecology like Wikipedia. For anyone looking to enact a new view of quality control into the designs of a software system, there’s a high barrier to entry -- the development of a realtime machine prediction model. Without exception, all of the critical, high efficiency quality control systems that keep Wikipedia clean of vandalism and other damage employ a machine prediction model for highlighting the edits that are most likely to be bad. For example, Huggle[20] and STiki[21] use a machine prediction models to highlight likely damaging edits for human reviews. ClueBot NG[22] uses a machine prediction model to automatically revert edits that are highly likely to be damaging. These automated tools and their users work to employ a multi-stage filter that quickly and efficiently addresses vandalism[23].

So, historically, the barrier to entry with regards to participating in the mediation of quality control policy was a deep understanding of machine classification models. Without this deep understanding, it wasn't possible to enact an alternative view of how quality controls should be while also accounting for efficiency and the need to scale. Notably, one of the key interventions in this area that did so was built by a computer scientist[24].

The result is a dominance of a certain type of individual -- a computer scientist (stereotypically, with an eye towards efficiency and with lesser interest in messy human interaction). This high barrier to entry and peculiar in-group has exacerbated a minimized margin and a supreme dominance of the authority of quality control regimes that were largely developed in 2006 -- long before the social costs of efficient quality control were understood.

If the openness of this space to the development of successors (the re-mediation of quality control) is limited by a rare literacy, then we have two options for expanding the margins beyond the current authorities: (1) increase general literacy around machine classification techniques or (2) remove the need to deeply understand practical machine learning in order to develop an effective quality control tool.

Through the development of ORES, we seek to reify the latter. By deploy a high-availability machine prediction service and engaging in basic outreach efforts, we intend to dramatically lower the barriers to the development of successors. We hope that by opening the margin to alternative visions of what quality control and newcomer socialization in Wikipedia should look like, we also open the doors to participation of alternative views in the genre ecology around quality control. If we’re successful, we’ll see new conversations about how algorithmic tools affect editing dynamics.  We’ll see new types of tools take advantage of these resources (implementing alternative visions).

  1. ?
  2. (Spinuzzi & Zachry, 2000)
  3. en:WP:CIVIL
  4. en:WP:MASTODON
  5. Lessig's Code is Law
  6. Lives of bots
  7. Snuggle paper
  8. Snuggle paper
  9. R:The Rise and Decline paper
  10. Snuggle paper
  11. Banning of a vandal
  12. Teahouse CSCW paper
  13. Teahouse Opensym paper
  14. Haraway, D. 1988. “Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective.” Feminist Studies, Vol. 14, No.3. (Autumn, 1988), pp. 575-599.
  15. Harding, S. 1987. The Science Question in Feminism. Ithaca: Cornell University Press.
  16. Snuggle paper
  17. Geiger, R.S. (2014, October 22-24). Successor systems: the role of reflexive algorithms in enacting ideological critique. Paper presented at Internet Research 15: The 15th Annual Meeting of the Association of Internet Researchers. Daegu, Korea: AoIR. Retrieved from http://spir.aoir.org.
  18. Teahouse CSCW paper
  19. Snuggle paper
  20. en:WP:Snuggle
  21. en:WP:STiki
  22. en:User:ClueBot NG
  23. When the levee breaks
  24. Snuggle paper
Jmorgan (WMF) (talkcontribs)

@EpochFail This is excellent. I made two very small textual changes. There's one additional piece of argument that you might want to add. Starting in the 4th paragraph from the end, you start to describe barriers to participation in quality control. You discuss the technical/expertise barrier around implementing machine learning systems, and I agree that is very important. I think it would also be useful to discuss the ADDITIONAL barrier created by the systems and practices that have developed around the use of these models. Could you argue, for example, that the existing models prioritize recall over precision in vandalism detection, and ignore editor intent, and that this is because those design decisions reflect a particular set of values (or a mindset) related to quality control? People who don't share that mindset--people who are more interested in mentoring new editors, or who care about the negative impacts of being reverted on new editor retention--won't use these tools because they don't share the values and assumptions embedded in the tools. By creating alternative models that embed different values--through interpretability, adjustable thresholds, and "good faith" scores--you provide incentives for folks who were previously marginalized from participating in quality control. Thoughts?

Adamw (talkcontribs)

I’m trying to catch up with the genre ecologies reading, and a first impression is that genre diagrams have a lot in common with data flow diagrams.  The edges contain a process, and the nodes might contain multiple data stores.  I appreciate that the genre theory is giving us a more zoomed-out perspective, in which human behaviors like habits and culture begin to emerge.  From my quick browsing of the background work on genre ecology, I think you’re breaking ground by suggesting that machines mediate in this space as well, in order words considering the data flows which become invisible because they don’t generate genres.  For example, editors will read the genre of ORES scores via a UI, and their administrative actions create a record of reverts, but we must account for the mostly automatic process of training a ML model on the reverts and updating scores, which changes the network topology into a feedback loop.  I’d appreciate help freeing myself of my data flow interpretation on genre ecologies, at some point.  If machine mediation is something new in genre ecologies, then I’m curious about what we gain by bringing in this theory.

Great to see the focus on effecting change!  I personally agree wholeheartedly that “successors come from the margin”, that we could design interventions all day and the results might even be quite positive, but that the most just, lasting, and visionary change will come from empowering our stakeholders to “let a hundred algorithms bloom”, and we may be able to catalyze this by creating space at the margins.

Not sure we need to present a stereotypical computer programmer who prefers determinism and logic to messy humans.  It feels like a straw dog, although I won’t deny I’ve heard those exact words at the lunch table…  Maybe better to just point out how simplistic solutions are seductive, and are encouraged by techie culture.

I want to hear more about how we’re opening the margins.  So far, I’m left with the suggestion that JADE will allow patrollers to push our models in new directions without ML-expert mediation.  This won’t be the obvious conclusion for most readers, I’m guessing, and I’d love to see this conclusion expanded.

EpochFail (talkcontribs)

First, I'm not sure I can address your thoughts re. process diagrams. I'm personally not as interested in actually modeling out the ecology as much as using the framework to communicate effectively about general dynamics. Maybe Jmorgan has some thoughts.

I love how you put this:

we could design interventions all day and the results might even be quite positive, but that the most just, lasting, and visionary change will come from empowering our stakeholders to “let a hundred algorithms bloom”, and we may be able to catalyze this by creating space at the margins.

When I'm thinking about margins, I'm imagining the vast space for re-mediation of quality control process without pushing the prediction models at all -- just making use of them in novel ways. I think that one does not have to fully open the world in order for effective openness to happen in a marginal sense. Though still, I do think there's going to be some interesting future work potential around making the prediction models more malleable. In the end, if there's a single shared model for "damaging" then that model will represent an authority and not a marginal perspective. We'd instead need to allow multiple damaging models if we were to support marginal activities at that level.

Reply to "Design rationale"