Manual:Developing libraries

This is a request for comment establishing guidelines for extracting, publishing and managing PHP (and JavaScript?) libraries based on code originally developed as a part of MediaWiki or a related Wikimedia project.

There is a growing desire to separate useful libraries from the core MediaWiki application and make them usable in non-MediaWiki based projects. This "librarization" of MediaWiki is thought to have several long term advantages:
 * Make life better for new (and experienced) developers by organizing the code into simple components that can be easily understood.
 * Reverse inertia toward ever expanding monolithic core by encouraging developers (in core) to develop their work as reusable modules with clearly-defined interfaces
 * Start making true unit testing of core viable by having individually-testable units
 * Provide an interim step on the way to service-oriented architecture in a way that is useful independently of that goal
 * Encourage reuse and integration with larger software ecosystem. Done correctly, this will provide a useful means of expanding our development capacity through recruitment of library authors eager to showcase their work on a top 10 website.
 * Share our awesome libraries with others and encourage contributions from them even if they aren't particularly interested in making our sites better.

In order for this strategy to be successful, these libraries need to develop a life of their own, independent of MediaWiki. Therefore, it will be important for library authors to have some latitude and independence in making the library successful. The policies surrounding these should largely be dictated by the primary maintainer of the library, and the choices made may diverge from MediaWiki core. Note that that primary maintainer may not be the original author of the majority of the code (or even any of it), and will have latitude to make independent decisions about the library. The amount of latitude a maintainer gets is proportional to the amount of commitment, credibility and hard work with respect to the library they maintain.

Repository hosting guidelines

 * Hosted in Gerrit with GitHub mirror
 * Hosted under the Wikimedia GitHub organization
 * Hosted under another GitHub organization
 * Hosted in Phabricator with GitHub mirror (?are we ready to try arc out with libraries?)

It is expected that the Wikimedia Foundation will invest in tooling that makes code review transfer from GitHub to internal review tools (phabricator) in the future. Eventually this should eliminate the difference between hosting the primary git repository with the Wikimedia Foundation or GitHub. In the near term, Gerrit hosting should be considered to be the default hosting option except in cases where an effort is being made to attract a significant portion of contributions from external developers.

Hosting under an individual user's GitHub account is discouraged. It complicates pull request based code review for the repository owner and makes management of the repository by a shared group difficult. Note: this doesn't mean it has to be hosted under the Wikimedia account specifically; in fact, it can be more convenient to have the project under a different organizational account specific to the project (such as CSSJanus).

Repository naming guidelines
Probably varies somewhat based on hosting location. Follow the local conventions as far as reasonable and possible.

Do not:
 * Add "mediawiki-" or "wikimedia-" prefixes to GitHub hosted projects to match the Gerrit mirror naming. Chad will taunt you mercilessly if you do.

Issue tracking guidelines

 * Phabricator
 * GitHub

Primary issue tracking should follow the git hosting chosen in most cases. This will reduce the friction of matching commits/pull requests to issues and vice versa. For Gerrit hosted repos this means using the Wikimedia Phabricator instance and for GitHub hosted repos the built in GitHub issue tracker.

IDEA: Can we enable GitHub auth on our Phabricator instance and make a bot that copies issues created on GitHub into Phabricator and leaves a note explaining where to find the resulting discussion? Optimally on resolution the state of the GitHub issue would be updated with another message from the bot and a corresponding status change.

Code review guidelines
Project code review should use the tool most closely associated with the primary git hosting. Regardless of choice of hosting platform, pre-merge code review and unit testing are strongly encouraged.

If primary hosting is via GitHub, changes should be proposed via pull requests rather than direct push to master. In most cases the pull requests should originate from a fork of the repository associated with the user's own GitHub account. Blatant self-merge behavior should be seen just as distasteful on GitHub as it generally is in Gerrit.

Code style guidelines
Preference toward MediaWiki or PSR-2, but most important things are clarity, consistency, and best likelihood of adoption.

Automated style checks are STRONGLY encouraged for new libraries to reduce bike shedding on reviews and make it easier for new contributors to conform to the project's chosen style. The README should also point to a style guide if possible.

Automated testing guidelines
Both pre and post-merge testing should be used by libraries. The testing should include basic lint, unit tests and coding convention checks.


 * Projects hosted with the WMF can use Jenkins.
 * FIXME add how to for adding jobs for a composer managed project. I think we are going to settle on requireing a  script command and some conventions on output.
 * GitHub hosted projects can use Travis.
 * See cssjanus's .travis.yml and composer.json as an example.

Packagist guidelines
PHP libraries should provide a composer.json package manifest and be published on packagist.org.

Libraries hosted in Gerrit or by the Wikimedia GitHub account should typically be published under the "wikimedia" namespace (eg "wikimedia/cdb", "wikimedia/simplei18n"). The use of the "mediawiki" namespace should be reserved for extensions and other intrinsically MediaWiki related components (bot frameworks, etc). Projects hosted at GitHub under an independent organization are encouraged to adopt a similar convention of an organization namespace to be applied consistently across the libraries published by the group.

Packagist does not currently have a concept of organization accounts, but there are two shared access accounts ("wikimedia" and "mediawiki") that are under the control of the WMF. These accounts can be added as co-maintainers for any package. It is highly recommended to add the account matching the namespace that the library is published under to the Packagist entry. The "wikimedia" account can be added as a co-maintainer for any package published to Packagist even if the package is not published under the "wikimedia" namespace.

License guidelines
For almost anything that gets extracted from MediaWiki, it's likely that it will need to be GPLv2 (or later). All contributors must agree to a change of license from GPLv2+ in order for anyone to change the license (other than changing to GPLv3). The license of the new library needs to remain clearly marked in the headers of the code, and the full license file (typically called "LICENSE" or "COPYING") must be carried into the new project.

For a library consisting entirely of new code any license complying with the Open Source Definition is likely to be acceptable, but the GPLv2+ and [Apache License 2.0 Apache2] licenses may be the easiest to adopt. Both include contributor copyright grant clauses which are important for ensuring the integrity of the project's code base. The Apache2 license is seen as more permissive in that it allows derivative works to include a separate license for new contributions.

Documentation guidelines
Any library should have a README file that describes the project at a high level. Typically this file should be formatted using Markdown syntax for headers and links as that has become a common standard for the majority of git repository browsers and is still very human readable.

A good README will include:
 * A brief description of the primary use case the library solves
 * How to install the library
 * How to use the library (prose and brief code example if possible)
 * How to contribute
 * License name (GPL-2, ...)
 * Where to submit bugs
 * Where to submit patches
 * Link to coding standard
 * How to run tests
 * Where to see automated test results

Typically chosen combinations

 * Hosted in Gerrit and published to Packagist under the wikimedia namespace
 * Hosted in the Wikimedia GitHub organization and published to Packagist under the wikimedia namespace
 * Hosted in another GitHub organization and published to Packagist as something other than wikimedia/mediawiki

When hosting is in Gerrit the project should run as any other "typical" Wikimedia sponsored project with Gerrit code review, phabricator bug tracking and wiki documentation.

If hosted at GitHub the project could reasonably choose to do code review via pull requests and host the bug tracker on GitHub. This choice should be considered carefully on a project by project basis as divergenceof code review and issue tracking tools from the larger Wikimedia community has some disadvantages:
 * The Bugwrangler will not be expected to monitor your project for issues.
 * Members of the MediaWiki developer community will probably file bugs against your product in Phabricator anyway.
 * Moving a bug between your project and MediaWiki will be a more involved process.

These downsides may diminish over time if better bots can be created to integrate between the two environments.

Hosting under an independent GitHub organization makes sense for certain projects (CSSJanus, Wikidata, Semantic MediaWiki, ...) where an effort is being made to develop an independent and sustaining community for the project. It is especially reasonable in the case of a library like CSSJanus that is attempting to establish a cross-community and cross-language standard set of tools where only a portion of the tools overlap with the Wikimedia universe.

Transferring an existing GitHub repo to Wikimedia

 * File a ticket in the Phabricator Librarization project requesting transfer. (?do we have a better project for this?)
 * When contacted, add the responding Wikimedia GitHub administrator to the project as a "Collaborator".
 * The administrator will move the project to Wikimedia and give you access.

Tips for extracting a library
The details of extracting a library will vary depending on the code being extracted and its current entanglement with other MediaWiki specific classes. In the best case, the code you want to extract is already contained in the  directory of mediawiki/core.git and thus completely unencumbered. It is suggested that code which is not in this state is progressively updated and refactored until that is the case.

Once you get all the code into, things become a little more straight forward:
 * 1) Create a new project following the rest of the guidelines in this RFC.
 * 2) Import the code from   into your new project.
 * 3) * It may be possible to use  to extract a copy of the files with commit history preserved, but that is not currently considered a prerequisite for extraction. It should be sufficient to take the current head of the files into the new project and provide documentation of the file provenance in the new project's README.
 * 4) Create a proper   file that follows best practices for the project and publish to Packagist.
 * 5) Tag the repository to create a stable release.
 * 6) Propose a change to   importing the stable release of your new project . See Manual:External libraries for additional details.
 * 7) Propose a change to   in   to require the stable release of your new project.

It may also be necessary to introduce shim classes  to provide a backwards compatible bridge between your extracted library and the existing MediaWiki code base. The [//github.com/wikimedia/cdb CDB] library did this to provide backwards-compatible class names which did not require the use of the new  namespace.