Requests for comment/Opt-in site registration during installation

Description
This is an RFC to provide a MediaWiki API call that would be part of a statistics gathering extension. This extension would be installed on Wikimedia equipment (in WMF labs initially).

Rationale
Currently there is no method to collect statistics on wiki installations built into MediaWiki. This means that a lot (probably the majority) of wikis get installed and, unless they are discovered somehow, tools that track wiki installations (like wikistats or WikiApiary) cannot find them.

This is a method that would enable us to better estimate the number of users of MediaWiki and understand the environment in which they operate. This would help us understand how our decisions regarding changes to MediaWiki would affect our users.

To provide an obvious benefit to the person installing MediaWiki, if an end user would opt-in to this notification, we could also use future checks to notify the administrator on-wiki of security updates that they should apply.

Installer implementation
A screen would be added to the installer that would tell the user what information is being sent to the ping server, what purpose it would serve, and request permission to send it. The information sent would be summarized in a human-readable form and access provided, if requested, to the actual JSON strings that would be sent to the ping server. This allows the user to verify, if they are so motivated, that the data we're sending is what we say it is.

The following additional bits of information would be collected:


 * A wiki logo
 * This could be uploaded to the wiki and would provide a good opportunity to verify that the wiki is correctly configured. The logo would be used (if provided) to help identify the wiki.


 * Wiki tagline
 * This would be put in MediaWiki:Tagline and also raise the visibility of this feature.


 * Wiki description
 * Could be used to seed Project:About.

These additional bits of information would be added to the siteinfo api call.

After the user provided this information and agreed to ping this ping server, the following information would be sent to the ping server:

Alternatively, send only the API url and the properties that the siteinfo API does not provide (bug 54428). This also verifies the server gives a valid response (see Spam the other way).

Some provision needs to be made for wikis that are not publicly available. There may also be a need to allow the owner of the wiki to keep particular bits private while revealing others.

Ping server implementation
The ping server would be a MediaWiki installation with an extension installed that would be capable of receiving the above pings. Each ping would result in a new row in a database table.

Phabricator tasks

 * Provide an opt-in ability to register the user's MediaWiki installation
 * Create Ping server extension for MediaWiki
 * Create ping.wmflabs.org
 * Expose proposed ping info in the API
 * Ask for wiki logo during installation
 * Ask for oneline wiki tagline during installation
 * Ask for wiki description during installation

Considerations
Here are some considerations that must be dealt with properly before this can be implemented.

Privacy
In an effort to preserve privacy, this should be an opt-in feature. That is, the end user must purposefully checks a box saying “ping Wikimedia” (and it is made clear what information is being sent in the ping) before the ping is executed.

Spam
Spam is already a problem for many new Wiki users. If there was a place to get new wiki installations shortly after they happened, the spammers could bomb them almost instantaneously. For this reason, access to real-time data should be limited initially. Perhaps if we have better mechanisms for fighting spam, access to real-time data could be allowed.

Alternatively, some users may want us to advertise their wiki upgrade or installation, so perhaps there should be a “make this installation public” option with requisite warnings.

Spam the other way
It would be very easy for someone to spam this database by sending false "pings", either with completely bogus data or with data pointing to a (possibly non-wiki) spam site.

This is another reason not to publish all the pings received without at least verifying that it is a MediaWiki site.

Discussion
See |talk page