Extension:PronunciationRecording/GSoC 2013

=GSOC Project Idea : Pronunciation Recording Extension=

I have been engaged with MediaWiki since 2 months now and since GSOC 2013 is up soon i felt its the right time to draft a Rough Proposal

Identity
Name: Rahul Maliakkal Email: rahul14m93@undefinedgmail.com Project title: Pronunciation Recording Extension

Contact/working info
Timezone: IST (UTC +5:30) Typical working hours: Very flexible. I can adjust my work hours to anytime between 14:30–21:30 UTC (20:00–03:00 Eastern) and can work on the weekends for 5 hours extra. IRC or IM networks/handle(s): Rahul21 (Freenode) Time constraints: I just want to be clear up front that I do have a few time constraints to work around. I will be having my 6th semester examination from 22nd may - 31st may, so ill be a little less active during that time, I've prepared my schedule so that the main part of this project will be complete before September 4.

Introduction



 * There is a Thread in the mailing list requesting this extesnion.
 * In Wiktionary many words have pronunciation audio files(.ogg) attached with them, these audio files tell the user how to pronounce a word in specific language. Same words are pronounced differently in different parts of the world. Example : The word Garage is spoken differently all around the world [Garage]
 * The Wiktionary page of the word behavior has pronunciation attached with it.
 * The word "minute" is pronounced differently when it is spoken in the time context as compared to when it is spoken in the quantity context, such words are called hetronym. The audio files attached to each of the etymologies clearly depict this difference.
 * But there are several words that do not have audio files attached to them. Conducting a rough survey I found out that words used extensively in a particular discipline i.e medicine, mathematics, etc don't have audio files attached to them. Example : aggravate, compendium

Simple workflow

 * The workflow basically consists of 3 steps
 * 1) A Record Pronunciation link is displayed on the Wiktionary page of a word that does not have a pronunciation file attached to it.
 * 2) When the user clicks on the Record Pronunciation link a dialog box pops up. The dialog box basically consists of 4 parts :
 * 3) The Recording Toolbar : It essentially consists of a user friendly toolbar that would help the user to record pronunciations . It essentially consists of buttons like "Record", "Stop", "Play", "Reset". The description of each button is fairly self-explanatory . The Recording Toolbar is not shown in the snapshot, the words Recording Toolbar will be replaced by a working toolbar . The user will get a maximum of 5 seconds in which he can record the pronunciation.
 * 4) IPA : This section consists of the IPA of the word that the user wants to record . It will assist the user in pronouncing the word correctly.
 * 5) Choosing a License : To upload a file to Wikimedia Commons requires licensing . If the file the user wishes to upload is his/her own work then he/she can choose from a variety of licenses . When the user clicks on the "This file is my work", then automatically the radio buttons to the 3 licenses are activated and the radio button corresponding to "This file is not my work" is deactivated .This applies vice-versa too.
 * 6) Upload Button : On clicking this button the file is uploaded to Wikimedia Commons a with a specific file name like en-minute.ogg. For a different etymology of the same word the file name will be en-minute-1.ogg and for a different language the file name will be fr-minute-1.ogg.
 * 7) The Success and Thank you Note : After the user clicks the upload button if the file is successfully uploaded to commons then a dialog box confirming that upload was successful will be displayed, this dialog box also consists of a small Thank You note and when the user clicks on the "Finish" button ,the Wiktionary page automatically refreshes and the .ogg file is embedded into the page.


 * We will be using the word aggravate as a reference ,since it does not have a pronunciation file attached to it. The workflow that I described is illustrated through a UI mockup.


 * Images 2,3,4,5 in the above gallery is a visual representation that i would like to see after my extension has been deployed,so do not get confused ,as off now such a LINK DOES NOT EXIST

Technical Implementation

 * I plan on recording the pronunciation using webRTC+Web Audio API supported by HTML5.
 * Right Now Chrome m27 (Beta channel) supports audio recording through microphone.
 * Google Chrome canary has been supporting audio recording through microphone since m23.
 * Firefox v20 has a small bug with audio recording and is expected to be fixed soon.
 * I have had conversations with developers from Firefox and Chrome, they told me how webRTC has exploded into the scene and since their product release cycles are fast and since this tool will take about 6 months to get fully deployed , i see no issues with browser compatibility then.
 * When the audio gets uploaded to Wikimedia Commons ,using the Template:audio ,the pronunciation file will be immediately embedded into the Wiktionary page.
 * If Time permits i plan on including a feature for mass uploading of pronunciation.

Benefits

 * When Audio Recording will be fully supported by all browsers, Wikimedia foundation will have a tool to record pronunciation's by then.
 * Will make life easier for a lot of students ,who are a little weak at basic linguistics.

Future improvements beyond the scope of the GSOC Period

 * A Rating Extension
 * An Audio Filter,trimming the start and end silence(Thanks for the suggestion)

Timeline

 * To be uploaded soon

Feedback and Discussion

 * I am very happy to see the response from various Wikimedia Foundation communities to my project.Thanks a lot and keep the feedback and suggestions pouring in
 * I would like to thank Matthew Flaschen ,Quim Gil and above all to my mentor Michael Dale.