GDPR (General Data Protection Regulation) and MediaWiki software

The General Data Protection Regulation (GDPR) (EU) 2016/679 is a regulation in EU law on data protection and privacy for all individuals within the European Union and the European Economic Area. It also addresses the export of personal data outside the EU and EEA. The GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU.

The GDPR came into effect on May 24th 2016, with a two year transition period ending on May 25th 2018. Compliance has been mandatory for all organizations collecting data from persons related to the EU since then. Since the MediaWiki Software is also used by users inside the EU or offered to users there, there were several question raised how to make MediaWiki compliant to GDPR.

'''Please note the following are recommendations collected by affected MediaWiki users who are not lawyers. Please proceed at your own risk.'''

This article is NOT on how to change MediaWiki software to make it GDPR compliant (if at all necessary), it is on any website built on MediaWiki software. So, this focuses on extensions, scripts, or anything you can think of that may make the website more GDPR compliant.

Absolute compliance is not possible on public MediaWiki websites, at least for the website administrators, as most of the content are posted by the users. So the title is how to make it “MORE” GDPR compliant.

Please try to suggest anything about general MediaWiki websites that can be a problem due to GDPR and all the possible ways to correct or at least circumvent this problem.

Is my MediaWiki affected at all?
if you have a public accessible and/or editable MediaWiki located on a server in the EU or which may be accessed by people from the EU, you need to GPDR compliant since even IP addresses are considered data that is protected by that. Since you can't access a website without logging the IP address, your site is also affected. (Even if you yourself are not logging the IP on your site, your hoster will in some log files you have no idea of. In this case you may require a signed ODP [order data processing] from your hoster.)

EtherMan on Wikipedia, MediaWiki, and the GDPR fear - opinion?: "MediaWiki is... special in this regard... On one hand, it is already compliant really as a software. GDPR basically says that you have to agree to any data that is stored. Well that's easy enough, MediaWiki only stores what you specifically supply it, and every single edit you make, comes with the clause that you are placing whatever you entered under a license which enables reuse. And the second part is that you have to be able to on request of the owner to release everything that is stored on you, to you. Well that's simple enough through MediaWiki by searching for any edits you made because that in theory, should be all data on you.

Unfortunately, on the other hand... Wikipedia will never be compliant with it, because technically, it does require you to turn over ALL data stored ON you... Not BY you. Basically, any and all data they have ever received in regards to you. That would mean all the private arbcom mails that is in regards to any case you're involved in. It means any and all mentions of you on any talk or article page and so on. Heck it even technically covers any admins that have a stick it note on their monitor with a list of their most hated users if that involves you. This is ALL covered...

Point is, MediaWiki does from a technical standpoint allow a company to be compliant. But Wikipedia is never going to be because it's just fundamentally incompatible and it would be a MASSIVE workload to fix, and the whole structure is not set up to handle any sort of freedom of information on the processes which GDPR requires but that's a problem of Wikipedia, not MediaWiki specifically.(...)"

Using existing tools and means

 * Existing laws (like the Cookie policy of the Privacy and Electronic Communications Directive 2002) are currently still applicable when GDPR became effective (as it has been for the last two years with GDPR). So if your Wiki's server is located in the EU and/or you are catering to EU based users, you may want to use Extension:CookieWarning, if you aren't already. Text of warning/explanation may have to be modified accordingly.
 * GDPR demands implementation of "appropriate, cost-effective controls" to protect the personal data of EU residents. It's still debated whether GDPR demands encryption, per se. Text uses words "such as encryption", "may include encryption", "as appropriate (...) pseudonymisation" etc. so this are more suggestions than a demand For SEO reasons and because Google Chrome will marks web pages without SSL/TLS certs as "not secure" soon, you may want to use a SSL certificate for your wiki anyway. Maybe your hosting provider can offer free Let's encrypt certificates?
 * If your wiki's SQL DB is breached, you have to inform authorities and your users which information were affected/stored (usually IP address for all editors) and e-mail address, user names First and last name if you request them from your user for logged users)
 * You may want to ask your users whether they are older than 16 years upon sign-up or if they have the consent by their parents.
 * update the wiki privacy statement, if not done already, that IP address (for all editors), E-mail addresses and User names (for registered users) are stored which is technically required for tracking and roll back of edits and that user agree to that upon sign-up or editing your wiki, that who they have to contact if they want to have their account deleted etc. A GDPR conform privacy generator may be used for private use, for example:
 * German: https://datenschutz-generator.de/ (unfortunately available only in German)
 * German, English and French: https://www.ratgeberrecht.eu/leistungen/muster-datenschutzerklaerung.html (menu and checkboxes in German)
 * please add privacy generators for other languages here
 * if you requested e-mail addresses upon sign-up, inform your users that they may unset their e-mail address in their user preferences. Since an e-mail address is not required for registering this may only affect wikis that use emailconfirmed for editing privileges (for example for Anti Spam measures, YMMV).
 * add the name of an Admins/Mod(s) on the privacy article in case somebody wants information about their stored data. I would add link how they can close their account.
 * add which data protection legal authority is responsible for your country/(or to be determined for Germany), this is also required by GDPR.
 * use Help:RevisionDelete for hiding IP addresses and revision texts that once included personal data from older versions

=Issues=

The account e-mail addresses are stored in MySQL databases unencrypted/in plain text
This becomes a problem if the SQL database is breached.

Possible solution (suggested by @Ciencia Al Poder) - this should not be a problem, as
 * the email addresses are accessed only by the system admin.
 * MediaWiki software is able to decrypt the email addresses even if encrypted, to use them for sending emails.
 * Anyone with access of MediaWiki code or shell can decrypt them using the software
 * The users have control to see their email addresses and delete them if they want to.

Deleting user accounts

 * Mediawiki Users can't delete their account themselves. Of course there are extensions that allow merging accounts and there is always the possibility to delete the entries in SQL, but in general, it's not preferred to delete users
 * General warning: Do not make changes directly to the database unless you really know what are you doing. Also, removing rows from the user table can cause unexpected errors when viewing logs or page histories where the user appears.

Please add: overview for SQL commands to delete accounts or link to existing documentation


 * Extension:DisableAccount removes password and email. You can then use Extension:Renameuser to rename the account with random characters. No need to perform changes directly on the database.
 * If you feel you need to also hide the original user name from the log, it can be hidden if you have the  user right.

The problem of deleting a user's contributions
Also Extension:DeletePagesForGood was recommended. Also you can directly delete from the database (though a bit risky).
 * deleting a user's contributions leaves behind the username (or IP address if the user is not logged in) in deletion log - thus, is not a clean process.The record of user's contribution in revision history can be deleted using the Manual:DeleteOldRevisions.php . Read this thread for more info - Topic:Tf2bj711f0x48dba.

A user wants to remove any references to their username

 * the username is attached to every edit make by the user. The user has no ability to remove any of those references.
 * Possible solution (also suggested by @Rocketpipe) - It might be sufficient if the user is told of this limitation during the account creation process.
 * As an alternative, the admin could set up a generic user called "Deleted user" (or something similar) and merge all edits from the user that wants to have the references removed with Extension:UserMerge to the generic "Deleted user".
 * This has the problem that there will be a log entry referring to that Merge -> Username, but the log entry details can be hidden.
 * User with  right may "suppress" an account, which will remove it from page histories, log entries and list of user accounts.

New revisions of a page that included personal data of a person have been that created
Possible solution:
 * Remove information directly in SQL in all revisions?
 * Don't edit the database directly. Use Help:RevisionDelete.
 * Use Patrolling edits and/or Extension:Approved Revs or Extension:Moderation for avoiding that information is re-added?

A page that included personal data of a person has been moved (= published at another URL)
Not sure what the issue here is, moved as in information was copied to another wiki?

Google has cached old privacy related content from your wiki that was removed by now

 * Please refer to Google's Remove outdated content manual on how to removed cached content from Google Search reults.

A user wants to delete or hand over their data
Possible solution (suggested by @TheDJ):
 * a user can delete his/her email address by removing it in the preference. Email address is not required by MediaWiki.
 * A note: The requirement of email address during account creation can be specified by the system admin using Localsettings.php. So, the first point in the solution may not be applicable to all websites.
 * The user releases his/her contributions under the license mentioned in the specific website.
 * Database user can delete a specific contribution if needed.
 * Please add: overview for SQL commands to delete accounts or link to existing documentation
 * Please use RevisionDelete, don't edit the database yourself unless you're absolutely sure about what are you doing!

Removing the "real name" field in the sign-up form

 * with $wgHiddenPrefs[] = 'realname'; in LocalSettings.php: (see also Manual:$wgHiddenPrefs).
 * Checkbox: You can use Extension:NewSignupPage to do so.

Hiding the display of IP addresses for anonymous editing

 * workaround: disable anonymous contribution without a user account via $wgGroupPermissions['*']['edit'] = false; in LocalSettings.php
 * by hooking Manual:Hooks/HtmlPageLinkRendererBegin, when you detect the link is to an IP and replace the IP with a generic user name like Anonymous editor" or "Not logged in user"
 * replace the IP address with a hash to anonymise it, either during editing or on display.

Implied consent is no longer sufficient
"Consent must be given through a clear affirmative action, such as clicking an opt-in box or choosing settings or preferences on a settings menu. Simply visiting a site doesn’t count as consent." 

This is relevant in the following situations:
 * an extension to edit the sign-up form (for example to add a check box that a user is older that 16 years) or for adding a link to the privacy statement
 * A checkbox on the sign-up form to accept the privacy policy (registration is rejected if the check is not ticked). The value of this checkbox should be recorded in database.
 * A checkbox presented on the edit form for unregistered users, and users that registered before adding the checkbox on the sign-up form, to accept the privacy policy. If the checkbox is not ticked when saving the edit, the edit is rejected. If a registered user is presented with that checkbox and submits the edit, this is recorded in database and shouldn't be presented anymore.

List of solutions:
 * Extension:UserAgreement can be used to achieve compliance for registered users.
 * Partial solution: changes to registration page (by editing MediaWiki:Signupstart), anonymous editing (by editing MediaWiki:Anoneditwarning). This does not make the opt in explicit, but it improves matters.
 * Addition of a special page for existing users to accept the new privacy statement (requires a new extension).
 * Partial solution: use Extension:NewSignupPage to require explicit acceptance of privacy terms on registration. This does not address existing users or anonymous edits.
 * User:TimSC's fork of Extension:NewSignupPage available here: . It asks anonymous editors to opt in to the terms of service. It allows user opt out of terms of service but the admin must still manually process data removal. It also prompts existing users that have not explicitly opted in to do so. All accept and reject actions are explicitly logged.

=List of existing Extension which might be useful for compliance=
 * Extension:Checkuser you will want to limited people who can access it to registered data controllers only.
 * Extension:UserMerge might come in handy if one merges the user to an existing generic one set up by the Wiki Admin.
 * Extension:CookieWarning will add a banner. It's not sure if this meets a consent opt-in as required by GDPR
 * Extension:Comments or Extension:Flow/Extension:StructuredDiscussions to substitute Facebook ad Disqus Comments plugins/functions, see Topic:Uddcz0ah9i70au4o?
 * Extension:AccountInfo - shows users what private data is stored about them (partially)
 * Extension:Replace Text (comes bundled with MW 1.31+): may be used to hide real names in signed comments with
 * RevisionDelete - a Mediawiki feature (bundled since version 1.16 and newer, disabled by default) that lets certain users show and hide individual page revisions. It also adds a special page called Special:RevisionDelete. Useful for hiding edits and page versions that included personal information that has been removed by now.
 * Extension:NewSignupPage to edit the Sign-up page for removing the (optional) field requesting the realname, as well as getting explicit opt-in for privacy statement.
 * Extension:Privacy: Requesting anonymization, requesting removal, retrieve all data stored about them in the system.
 * GDPR-compliant fork of Extension:EmbedVideo

=List of required/missing functions=
 * MediaWiki could use an extension where user can export their saved data as an XML or JSON file (still to be defined what that would includes besides the username, IP address etc what additional information was stored, if the changes/edits would have to be included), similar to Wordpress - although given the nature of how wikis works this might become a major headache (basically if a editor decides to have their personal data scraped, which is their right to do so, this means Wiki admins would have to change the user name to some anonymized version like "deleted user" or something, same for edits that were done by not logged in users where the IP is shown).
 * Mask IP addresses of unregistered users, by removing the last byte of IPv4 address (maybe by setting it to 0, for example, 192.168.1.5 would become 192.168.1.0), or the last 80 bits in case of a IPv6 address. This is what Google Analytics does for being compliant: IP anonymization on analytics. It may be done not only on "display", but also on the information stored on the database. A maintenance script could be made to anonymize existing data.

=Using Semantic MediaWiki as a tool for documenting GDPR compliance=
 * https://archive.org/details/emwcon2018-mediawiki_and_european_gdpr_datencockpit
 * see https://www.datencockpit.at for details (available in German only)

=Further reading=
 * Manual:Security
 * A note on our approach to privacy from Wikimedia
 * Thread in the MediaWiki Mailing list
 * Wikimedia Hackathon 2018 in Barcelona:
 * task at phabricator: https://phabricator.wikimedia.org/T194901
 * the documentation of the workshop:
 * Public YouTube stream on MediaWiki channel
 * Slides
 * Etherpad
 * Core Infrastructure Initiative Best Practices badge has some information that might be relevant to assessing MediaWiki

= See also =
 * Age classification (Germany) - (article is in German, applies to websites catering to visitors located in Germany)

=Support desk threads=
 * Hide user IP address in anonymous editing
 * Topic:Udnw29wn64if2m14
 * Topic:Ud7xbwbzxcgyfgzm
 * Topic:Ucy8sfl44i6n6i51