VERP

Adding proper email bounce handling to MediaWiki (with VERP)

 * Public URL: (https://www.mediawiki.org/wiki/User:01tonythomas/Implementing_VERP)
 * Bugzilla report: (https://bugzilla.wikimedia.org/show_bug.cgi?id=46640 )
 * Announcement: (https://www.mediawiki.org/wiki/Google_Summer_of_Code_2014#Adding_proper_email_bounce_handling_to_MediaWiki_.28with_VERP.29].

Name and contact information
Name: Tony Thomas Email: 01tonythomas@gmail.com IRC or IM networks/handle(s): tonythomas01 Location: Kerala, India Timezone: Kolkata,INDIA, UTC+5:30 Typical working hours: 5pm to 12:30am(workdays) 10:00am to 9:30pm(weekends)

Project Summary
It's likely that many Wikipedia accounts have a validated email address that once worked but is out of date. Wikipedia do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10+ years old. The wiki should not keep sending email that is just going to bounce. It's a waste of resources and might trigger spam heuristics. Two API calls need to be implemented: For the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests. The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don't want to react to one bounce. We want a history of consecutive mails bouncing. There would be a Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address. Since its the time MediaWiki mail infrastructure is being moved to new Data Center, this is the right time to implement VERP.
 * 1) One to generate a VERP address to use when sending mail from Mediawiki.
 * 2) One that records a non-transient failure.  That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user's address so mail will stop going to it.

VERP stands for Variable Envelope Return Path, and on implementation alters the default envelope sender. For eg: if an email needs to be send to bob@example.com, VERP alters the default envelope sender from : wiki@wikimedia.org to a prefix/delim/hash:  [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, so that the bounce can be used more effectively. The API would record the return address of the bounce and deduce that a mail to bob have failed. On consecutive failures, say at least 3 bounces with the oldest at least 7 days ago, the second API un-confirms the user's address.

The return path address needs to be a prefix/delim/hash as to avoid fake bounces DoSing a user. The hash can either be a random, stored token or one that is the output of a symmetrical encryption function.

Problem Background
When an email is sent, on the Wiki web server (i.e. mw1069) a message is injected to the local mta in shell call by the user the apache daemon runs under. Mediawiki uses the config variable $wgPasswordSender to set the envelope sender, and all messages are sent as the user : 'wiki@wikimedia.org'. The MTA on mw1069 is configured to route all messages to the destination/remote server from the MX records, and there can be many chances of this step failing and creating a bounce such as: Each case can either cause for Mchenry originating a bounce message, or in some cases the remote server will accept the message and later decide it can't complete and the delivery and originate a bounce message back. The bounce message go back to the envelope sender, destined for wiki@wikimedia.org. Currently, Mchenry routes mail for wiki@wikimedia.org to /dev/null, so the final step is Mchenry dropping the bounce message on the floor.
 * DNS lookup failure (Permanent failure)
 * Network failure (Temporary failure)
 * Remote server could be overloaded (Temporary failure)
 * Remote server might blacklisted wikimedia.org or wiki@wikimedia.org (Temporary failure)
 * Remote server could say example@gmail.com is a bad address (Permanent failure)
 * Remote server could say example@gmail.com is over quota (Temporary failure)

Deliverables
Since its time the WMF is moving its servers to a new data center and the mail infrastructure is being rebuilt, this is the right time to implement the functionality. The final results should be :
 * All emails for a user bob from wikipedia.com should have their default envelope sender changed from wiki@wikimedia.org to a VERP generated envelope sender as (prefix/delim/hash) say : [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org
 * If the mail delivery fails due to any of the problem discussed above, a return mail should reach the McHenry with the receiptent [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, and an API running there should record the failure and check for the past history of bounces of bob from a database, and unconfirm the user if threshold level met.
 * The VERP generated recipient address will be the output of a symmetrical encryption function, so that fake bounces DoSing a user wont occur.

Project Schedule
Will be updated in a while

About Me
I am a 19 year old Computer Science Engineering student from Amrita Vishwa Vidhyapeetham, Kerala, India. I am an active member of the FOSS community here - FOSS@Amrita. The FOSS club helps me work with code even late night in my college lab. I am a consistent user of Linux  for the past three years. The feeling of Open Source is so compelling, you can never quit contributing to them. I found the Wikimedia Community one of them. My first contribution to Open Source was a bug fix to MediaWiki almost six months before. Since then, I was working with the codebase, fixing and looking for errors, and creating new. I could help and mentor many of my FOSS club mates to contriubte to MediaWiki, you can see the full list here. Along with academics I get time to work in the lab from 5:00 pm to 11:00 am on all working days and 10:00am to 11:00pm on weekends.

I have chosen as adding functionality to the Email Component as my project as it involves both the server side and MediaWiki side development. As one of my mentors mentioned it, I am sure its going to be a fun project, and helps me implement all the Networking lessons I learnt when I did an MCITP networking course before. The main aim is of course studying new things, understanding huge code bases, but this involves a hardware element too, which excites me. Coming from a remote village in Kerala I think this would a boost for me to spread Open Source in a much removed society, who still are ignorant about collaboration and thinking freely.

Past open source experience
I have contributed to the Wikimedia Project by various bug fixes in MediaWiki Core, Extensions and Browsertests.
 * Gerrit Changes: Gerrit owner :01tonythomas
 * GitHub profile: tonythomas01
 * Co-authored an app for Firefox OS Marketplace - Daily Wallet