Requests for comment/Workflows editable on-wiki

This is a draft overview of a potential design for the workflow description system. It is written for an audience of MediaWiki system architects and workflow description authors.

Stakeholders

 * Editor community: Replace ad-hoc on-wiki processes with a well-defined workflow, while leaving it customizable. They need to describe processes in a way that is readable and easy to change.  Site customization of workflows.
 * Extension authors: Tools like UploadWizard could be made customizable
 * Fundraising: We need a formal and verifiable way of enforcing rules about how we handle donations.
 * Product designers: We need to have a consistent user experience across wikis, while accounting for necessary local differences in workflows.

Alternatives
Less intrusive tools to help with onwiki process. For example, parser functions to set a 7-day reminder alarm. Queue managers to help with pages that list work items.

Considerations

 * This engineered approach might seriously damage wiki process discussions, by adding a layer of arcana that only tech wizards can manipulate.
 * The machine-executable state machine is rarely the simplest and most readable way to explain a process. It can be confusing as bloody hell, even when graphed as a picture.  Also, capturing process descriptions will introduce weird-looking artifacts like parallel subprocesses and extra states.
 * Documentation must be kept in sync with the implementation, especially with regard to the workflow description syntax.

Architecture decisions
Don't expose another Turing-complete DSL! Workflow customizability is entirely defined by the supporting code (engine and implementation). The idea is that with each workflow we are creating a DSLL, (like in Forth ;), which we can use to define very concise descriptions of workflow solutions covering a small set of problems.

Complex workflows should always be decomposed into a set of smaller, self-contained workflows, which can be executed in parallel or in sequence.

No job will change state unless it's moving along a predefined transition. There will be no UI or admin tool which can set a job to an arbitrary state.

The unconditional processing done on entry to a state simplifies the state machine graph so that it can always be trivially transformed to a Petri net. This type of graph has the lovely properties of being both pretty and easy to understand.

Discuss: The configuration may be inlined with the spec, or stored as a separate file so that we can regulate user edit access independently.

State variables are strictly serializable so that jobs are recoverable, may be paged out, or migrated between servers.

Issue: Only one signal can be queued at a time, the last signal sent wins. This is a tricky call. Machines must be self-stimulating, but I think the only case for a deeper signal stack would be "continue 2"-style evile magick, something like "make the next default action something different than the ordinary default." Not much of a case. Also, queue vs stack behavior is a mind-rending paradox to even consider. As for a signal overriding any previous, I think this is the behavior desired from a state with multiple actions.

Extensibility
The extension points are,
 * Writing libraries that define new actions
 * Calling these actions from a workflow description
 * When desired, the core of the engine can be reimplemented and used in parallel, as long as it complies with IStateMachine.
 * Hooks. Well, that's a TODO but also a must-have.

Access Control
Workflows have rules regarding who can initiate and interact. The potential granularity of access control variations could include per- state, signal. Maybe libraries should be whitelisted for use in workflow descriptions.

Exception handling
A specification may define global signals, which can be sent to a job in any state. This is a shortcut which expands to an implicit transition from every state in the workflow to a special exception state. An example would be, a workflow in when the user can "cancel" at any step, which fires cleanup processing and transitions to the exit node.

If a job becomes uncompletable for any reason, it should be flagged as permanently frozen, and cleanup performed outside of the workflow system. There is no "universal" exception mechanism to catch unexpected errors.

Transactionality
The easiest way to understand atomicity in workflows is to look at the steady-state, this is when a job has arrived in a state and processing stops. Every transition between these steady-states must be atomic, cannot be paused, and will be rolled back in case of error.

Transition will be protected by a database transaction, plus any state variables should be transactional as well.

Library actions are responsible for guaranteeing that any side-effects are rolled back if the transition fails. No idea what that interface will look like.

Asynchronous vs. synchronous states
States are asynchronous by default, meaning that the job will be paused after entering this state, and will remain in that state until receiving a new signal. A synchronous state is one in which the implementation provides a callback which runs upon entering this state. This callback will usually perform processing, and then send its own job a signal. User-interactive steps must run synchronously, by self-stimulating and completing the user interaction.

Versioning
Modifications to a production workflow are tricky, because there may be jobs in the queue already. The base behavior is that the system caches each revision of a workflow description, and jobs are version-locked to the description used to initiate them. Job migrations are always explicit, even when they are a no-op.

Diagnostics
Jobs will be logged as they move through a workflow, including any signals or actions. We could cache state variables in debug mode.

Control flow
These sequence diagrams are examples of how the workflow system can be driven from MediaWiki extensions, and how user interactions take place.



Workflow state can be loaded directly and used like an ordinary variable during page render.



When a user responds to



The workflow system may schedule tasks with MediaWiki. During the cron run, jobs might receive signals and transition, and actions make calls back into MediaWiki.

Implementing a workflow
Components:
 * Libraries
 * Default description
 * Default configuration

Given time, this component happens:
 * Customized specification and configuration

Example: Articles for Deletion
Overview:

Specification
name: Articles for Deletion Queue
 * 1) TODO: flesh out
 * 2) The AfD extension will hook on article save, and will check the article
 * 3) content for new deletion tags.  If this condition is present, we
 * 4) instantiate a new AfD job with the new revision as its argument, and
 * 5) begin the workflow.
 * 6) The workflow is split up into parallel and child workflows, a strategy
 * 7) that should be used liberally, everywhere.  We use the same implementation
 * 8) for all specifications here out of laziness, but there are really three
 * 9) archetypes: discussion queue, provisional endorsement, and admin review.
 * 10) Pages are wired to send the following signals to this workflow:
 * 11)   extend
 * 12)   keep
 * 13)   delete
 * 1)   keep
 * 2)   delete

libraries: # Provides redirect - WikiPages

# Enables synchronous states - SelfStimulating

# Tag pages, fork depending on existing tags - TaggedPage

# Perform an action in the future - ScheduleJob

# Provides limit_jepoardy, delete_in - ArticlesForDeletion

states: Start: initial: true actions: # Append this article to the AfD discussion page, then signal "open" add_to_afd_queue # This is a soft keep. If a child workflow later acts on "delete_in", # the expiration date and default outcome will be overridden. keep_in: normal_grace_period # Takes a map from deletion tag name to workflow specification title. # A child workflow is begun, which can signal back to this machine. fork_on_tag: PROD: Proposed Deletion BLP-PROD: Proposed Deletion, Biographies CSD: Speedy Deletion Queue Copyvio: Copyright investigation transitions: open: Discussion Discussion: transitions: # There is logic in here to limit total time open to maximum_discussion. extend: Discussion # Proposed Deletion PROD: actions: # Only allow PROD once per article. On successive invocations, # automatically send a "keep" signal and wait for admin review. limit_jepoardy: 1 scan_ # Sets an alarm to run delete_in: normal_grace_period transitions: # signaled by the implementation when the template is removed from the article, or     keep: Keep #     delete: Delete

Keep: actions: signal: review # No transitions, this is a final state

Delete: actions: signal: review # No transitions, this is a final state

Review: transitions: endorse: End reverse: End

exceptions: early_renomination: Keep
 * 1) Shoot us out of the state machine if premature renomination for deletion is demonstrated.

configuration: normal_grace_period: 7 days longer_grace_period: 10 days maximum_discussion: 21 days afd_queue_page: "Wikipedia:Articles for deletion" deletion_review_queue_page: "Wikipedia:Deletion review"
 * 1) Constants to be customized