Requests for comment/Json Config pages in wiki

Rationale
Many extension require fairly complex configurations. So far I know that two teams, Zero and Logging, have very successfully used the approach of storing JSON blobs as wikitext on Meta, e.g. m:Zero:250-99 and m:Schema:Echo. Both have developed custom code to edit, validate, store, and visualize that data, yet they both share lots of common code. Zero code was actually based on event logging code, but we have diverged since. Here, I propose for the common aspects of that code to be extracted into a separate extension JsonConfig.

Except for the few cases, most of the configuration information is stored as PHP or YAML files in multiple GIT repositories. For some cases, this works very well and should not be changed, but it proves very inconvenient for other cases where configuration is complex and error-prone, yet requires frequent updates. Message localization even went as far as creating a separate translatewiki to simplify it. The legacy Zero approach also favored non-file based configuration, storing everything as a single semi-structured wiki page.

A typical file-based configuration workflow involves
 * Editing in a text editor/IDE
 * Git and Gerrit commands to submit that file for review
 * Gerrit review with feedback, revisions, and finally +2
 * Manual deployment steps such as ssh tin && git pull && sync-file

Replacing this workflow with a wiki-based configuration has a number of pros and some cons:
 * Pros
 * In-browser editing removes the need to do any git operations (everything from setup, cloning, and pulling to adding, committing, and submitting)
 * Configuration is interactively validated by the same code that will use it later. There is no need to set up an independent Jenkins task.
 * Configuration is easily accessible by PHP and JavaScript code without any additional steps, as well as through the MediaWiki API.
 * Configuration can be visualized in a custom, extension-defined way
 * Optionally, the review process could be done by the flagged revisions extension
 * Configuration becomes active the moment it is saved / marked as reviewed
 * Reverts in production are as fast as reverting a wiki page to an older revision
 * A generic or a custom form-based editing interface would simplify some editing workflows


 * Cons
 * Edit linearity - GIT is much better suited for collaborative editing of the same file in exclusive fashion, and later merging it into a common master. Wiki page history is linear (although I heard of some proposals to change that). Since most changes in settings are fairly minor, I do not think this would be a significant problem.
 * Gerrit offers a much better review system, with per-line and overall comments, multiple reviewers, better email notifications, -1s, etc. Wiki config only offers watchlist notification and a talk page, which might suffice for some, but not all cases.
 * Most complex configs would be broken up into multiple wiki pages, making it harder to perform complex (e.g. regex) searches and would force one page at-a-time editing. This could be fairly easily solved with a simple script or a wiki mass editing tool such as AWB, but searching/editing multiple files is surely easier.

General Usage
Lets say we decide to store the IPs of all legitimate proxies, whose X-Forwarded-For header can be trusted. There will be two sites - the storage site (e.g. Meta), and the sites that use this info (all wiki-projects/languages).

Wiki Page
Proxy settings will be stored on meta as pages named Config:Proxy:***. This naming scheme allows us not to pollute other namespaces (which many users have objected, probably due to how advanced search looks, and how the list of namespaces is shown in all the special pages with filters). See section for how to set this up. All sites will have identical configuration, except the meta that will also have $wgJsonConfigEnableHosting = true.

A page Config:Proxy:Opera (for Opera Mini servers) might have this content:

Content class
All JSON is stored in a content class. We may choose to have a free-form JSON, in which case we don't actually have to write any code, and let JsonConfig use default JCContent for storage, or we may choose (preferable) to have data validation. JsonConfig base content class \JsonConfig\JCContent does not offer any validation except for JSON parsing, but you may choose to override validate( $data ) function to do custom validation, and getHtml for custom rendering. Alternatively, there is a \JsonConfig\JCValidatedContent class that offers a number of useful validation primitives.

\JsonConfig\JCValidatedContent treats JSON as a single level key-value storage, with each value being validated by a callback function. The class supports defaults, so the user will not need to check if certain value was given by the user. Page rendering will show JSON with all the defaults as grayed-out values, but will store only the values actually entered by the user. User values that equal defaults are also highlighted in a different color. When saving, the JSON is always reformatted to keep the order of key-values consistent, which makes version diffs easier to view. All unrecognized keys are placed at the end and highlighted.

Accessing Data - Internal
In order for the extension on any site to read configuration, it must have the same settings, and set API URL $wgJsonConfigApiUri='http://meta.wikimedia.org/w/api.php'; (may need to be customizable per config - TBD).

Since $content is actually an instance of our class, it could have more specific functions to work with the stored data. Upon loading this way, the content object will validate data using "light" mode - isSaving will be false, allowing content object to bypass some expensive validation.

Accessing Data - External
The stored configuration data may frequently be needed by some external agent such as JavaScript, bot, or some other program. JsonConfig will provide its own generic method to extract the data from the storage wiki. JavaScript could use either JSONP to access needed data, or we could develop a forwarding service. Also, extension authors may choose to have their own API modules to provide domain-specific information. The rvprop=jcddata</tt> API parameter would return JSON data as part of the API result, not as a text blob that rvprop=content</tt> would return.

Simple case - get one configuration page:

More complex example to get all proxy pages:

Configuration
Extensions that use config extension may choose several usage patterns:

Single page free-form configuration in the default namespace allows users to create just one page called Config:MyExtSettings. As long as the page is a non-empty JSON object, it will be accepted.

If your extension needs multiple similar settings pages, a sub-namespace can be used. This configuration allows any pages named Config:MyExt:...:

For some cases, an extension may choose to have its own top namespace instead of using a sub-namespace. Here we create namespace called Zero:... and Zero praise:...:

Of course at a certain point you would want a custom content class with its own defaults, validation, and HTML rendering. To set it up, specify a model ID and a class that derives from the \JsonConfig\JCValidatedContent</tt>:

$wgJsonConfigEnableHosting
This variable (false</tt> by default), enables storage of the configs on the current wiki. Keeping it as false</tt> is useful when this wiki will only use config values from another wiki, not actually store them locally.

$wgJsonConfigs
This variable defines which pages are treated as configuration pages. $wgJsonConfigs is an indexed array of arrays, with each sub-array having zero or more of the following parameters.

$wgJsonConfigModels
This variable defines which custom content class will handle which Model ID. More than one Model ID may be handled by the same content class. All content classes must derive from \JsonConfig\JCContent class. If the modelID is mapped to null</tt>, the default JCContent class will be used.

Example:

Implementation details
JsonConfig is implemented as two parts - storage/parsing and editing/visualizing. The editor/visualizer is only available when JsonConfig runs on meta wiki, and allows complex presentation of the Config namespace wiki pages. The storage/parsing is available on all wikis, allowing quick access to cache as well as validation and parsing of the cached json blob.

Implemented Features
These features have already been implemented in Zero and/or logging, and might be useful for other extensions:
 * Visualization shows JSON as an easy to view table rather than code, with some extra highlighting. For example if the value is not provided and a default is used, it is shown in gray, or when the value is the same as default, it shows as purple. For example, see this and this.
 * Code Editor simplifies JSON editing
 * Custom Validation performs complex checks such as checking that the value is in the proper format or that a user ID exists.
 * MemCached caching stores json blobs in memcached under custom keys and expiration policies, and resets them on save.
 * Flagged Revisions support allows configurations to be marked as "reviewed" before going into production
 * Localization of most basic interface elements has been done in many languages, and it would reduce translation work if most common messages would be done just once in one place.

Unimplemented Nice-To-Haves
These features would be desirable to more than one type of configs:
 * Schema validator - Validate against JSON Schema, as most extensions might not need complex validation rules, or might want to combine schema plus extra validation.
 * Custom editor - Zero team has been thinking about implementing a more complex editor, possibly based on JSON Schema.
 * API query support - Allow config pages to be returned as regular API results in all formats - json/xml/... instead of text blobs:
 * api.php ? action=query & titles=Config:Proxy:Opera & prop=jsonconfig</tt>
 * Localization - it would be good to be able to show localized descriptions for each configuration key