VERP

Page Under Construction

(Adding proper email bounce handling to MediaWiki (with VERP))

 * Public URL: (https://www.mediawiki.org/wiki/User:01tonythomas/Implementing_VERP)
 * Bugzilla report: (https://bugzilla.wikimedia.org/show_bug.cgi?id=46640 )
 * Announcement: (https://www.mediawiki.org/wiki/Google_Summer_of_Code_2014#Adding_proper_email_bounce_handling_to_MediaWiki_.28with_VERP.29].

Name and contact information
Name: Tony Thomas Email: 01tonythomas@gmail.com IRC or IM networks/handle(s): tonythomas01 Location: Kerala, India Timezone: Kolkata,INDIA, UTC+5:30 Typical working hours: 5pm to 12:30am(workdays) 10:00am to 9:30pm(weekends)

Project Summary
It's likely that many Wikipedia accounts have a validated email address that once worked but is out of date. Wikipedia do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10+ years old. The wiki should not keep sending email that is just going to bounce. It's a waste of resources and might trigger spam heuristics. Two API calls need to be implemented: For the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests. The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don't want to react to one bounce. We want a history of consecutive mails bouncing. There would be a Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address. Since its the time MediaWiki mail infrastructure is being moved to new Data Center, this is the right time to implement VERP.
 * 1) One to generate a VERP address to use when sending mail from Mediawiki.
 * 2) One that records a non-transient failure.  That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user's address so mail will stop going to it.

VERP stands for Variable Envelope Return Path, and on implementation alters the default envelope sender. For eg: if an email needs to be send to bob@example.com, VERP alters the default envelope sender from : wiki@wikimedia.org to a prefix/delim/hash:  [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, so that the bounce can be used more effectively. The API would record the return address of the bounce and deduce that a mail to bob have failed. On consecutive failures, say at least 3 bounces with the oldest at least 7 days ago, the second API un-confirms the user's address.

The return path address needs to be a prefix/delim/hash as to avoid fake bounces DoSing a user. The hash can either be a random, stored token or one that is the output of a symmetrical encryption function.

Problem Background
When an email is sent, on the Wiki web server (i.e. mw1069) a message is injected to the local mta in shell call by the user the apache daemon runs under. Mediawiki uses the config variable $wgPasswordSender to set the envelope sender, and all messages are sent as the user : 'wiki@wikimedia.org'. The MTA on mw1069 is configured to route all messages to the destination/remote server from the MX records, and there can be many chances of this step failing and creating a bounce such as: Each case can either cause for Mchenry originating a bounce message, or in some cases the remote server will accept the message and later decide it can't complete and the delivery and originate a bounce message back. The bounce message go back to the envelope sender, destined for wiki@wikimedia.org. Currently, Mchenry routes mail for wiki@wikimedia.org to /dev/null, so the final step is Mchenry dropping the bounce message on the floor.
 * DNS lookup failure (Permanent failure)
 * Network failure (Temporary failure)
 * Remote server could be overloaded (Temporary failure)
 * Remote server might blacklisted wikimedia.org or wiki@wikimedia.org (Temporary failure)
 * Remote server could say example@gmail.com is a bad address (Permanent failure)
 * Remote server could say example@gmail.com is over quota (Temporary failure)

Deliverables
Since its time the WMF is moving its servers to a new data center and the mail infrastructure is being rebuilt, this is the right time to implement the functionality. The final results should be :
 * All emails for a user bob from wikipedia.com should have their default envelope sender changed from wiki@wikimedia.org to a VERP generated envelope sender as (prefix/delim/hash) say : [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org
 * If the mail delivery fails due to any of the problem discussed above, a return mail should reach the McHenry with the receiptent [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, and an API running there should record the failure and check for the past history of bounces of bob from a database, and unconfirm the user if threshold level met.
 * The VERP generated recipient address will be the output of a symmetrical encryption function, so that fake bounces DoSing a user wont occur.

Past open source experience
I have contributed to the Wikimedia Project by various bug fixes in MediaWiki Core, Extensions and Browsertests.