Extension:PronunciationRecording/GSoC 2013

=GSOC Project Idea : Pronunciation Recording Extension=

I have been engaged with MediaWiki since 2 months now and since GSOC 2013 is up soon i felt its the right time to draft a Rough Proposal

Identity
Name: Rahul Maliakkal Email: rahul14m93@undefinedgmail.com Project title: Pronunciation Recording Extension

Contact/working info
Timezone: IST (UTC +5:30) Typical working hours: Very flexible. I can adjust my work hours to anytime between 14:30–21:30 UTC (20:00–03:00 Eastern) and can work on the weekends for 5 hours extra. IRC or IM networks/handle(s): Rahul21 (Freenode) Time constraints: I just want to be clear up front that I do have a few time constraints to work around. I will be having my 6th semester examination from 22nd may - 31st may, so ill be a little less active during that time, I've prepared my schedule so that the main part of this project will be complete before September 4.

Introduction



 * There is a Thread in the mailing list requesting this extesnion.
 * In Wiktionary many words have pronunciation audio files(.ogg) attached with them, these audio files tell the user how to pronounce a word in specific language. Same words are pronounced differently in different parts of the world. Example : The word Garage is spoken differently all around the world [Garage]
 * The Wiktionary page of the word behavior has pronunciation attached with it.
 * The word "minute" is pronounced differently when it is spoken in the time context as compared to when it is spoken in the quantity context, such words are called hetronym. The audio files attached to each of the etymologies clearly depict this difference.
 * But there are several words that do not have audio files attached to them. Conducting a rough survey I found out that words used extensively in a particular discipline i.e medicine, mathematics, etc don't have audio files attached to them. Example : aggravate, compendium

Required Deliverables

 * Since i plan on building my extension with the help of TMH(Timed Media Handler) extension,i will be first adding .wav support to TMH.


 * Record 5 second audio pronunciations via HTML5. We use getUserMedia API to access the microphone, use the Web Audio API to get access to the raw data.
 * Fix some browser compatibility issues.
 * Implementing the UI wizard flow to record pronunciations.
 * Store the recordings in .wav format in commons and then embed .ogg flavors of the recordings using the Template:audio in respective Wiktionary pages.
 * Customizing the style of the UI implementation via CSS classes for various skins available i.e Vector, Monobook, etc.

If time permits

 * Expand the idea to record spoken articles.
 * Implement a rating extension to evaluate the quality of the words recorded.

Improvements beyond the scope of GSOC

 * Expand the idea for mass uploading of words.
 * Implement an audio filter that uses Noise Reduction which makes the pronunciation's crystal clear.

Simple workflow

 * The workflow basically consists of 3 steps
 * 1) A Record Pronunciation link is displayed on the Wiktionary page of a word that does not have a pronunciation file attached to it.
 * 2) When the user clicks on the Record Pronunciation link a dialog box pops up. The dialog box basically consists of 4 parts :
 * 3) The Recording Toolbar : It essentially consists of a user friendly toolbar that would help the user to record pronunciations . It essentially consists of buttons like "Record", "Stop", "Play", "Reset". The description of each button is fairly self-explanatory . The Recording Toolbar is not shown in the snapshot, the words Recording Toolbar will be replaced by a working toolbar . The user will get a maximum of 5 seconds in which he can record the pronunciation.
 * 4) IPA : This section consists of the IPA of the word that the user wants to record . It will assist the user in pronouncing the word correctly.
 * 5) Choosing a License : To upload a file to Wikimedia Commons requires licensing . If the file the user wishes to upload is his/her own work then he/she can choose from a variety of licenses . When the user clicks on the "This file is my work", then automatically the radio buttons to the 3 licenses are activated and the radio button corresponding to "This file is not my work" is deactivated .This applies vice-versa too.
 * 6) Upload Button : On clicking this button the file is uploaded to Wikimedia Commons a with a specific file name like en-minute.ogg. For a different etymology of the same word the file name will be en-minute-1.ogg and for a different language the file name will be fr-minute-1.ogg.
 * 7) The Success and Thank you Note : After the user clicks the upload button if the file is successfully uploaded to commons then a dialog box confirming that upload was successful will be displayed, this dialog box also consists of a small Thank You note and when the user clicks on the "Finish" button ,the Wiktionary page automatically refreshes and the .ogg file is embedded into the page.


 * I am using the word aggravate as a reference, since it does not have a pronunciation file attached to it. The workflow that I described is illustrated through a UI mockup.


 * Images 2,3,4,5 in the above gallery is a visual representation that i would like to see after my extension has been deployed,so do not get confused ,as off now such a LINK DOES NOT EXIST

Project Schedule aka Timeline

 * Before May 27th - Familiarizing myself with the MediaWiki codebase.


 * May 27th to June 17th - Research thoroughly on my implementation idea and to gather all possible resources for the coding period and creating a Wikimedia Lab Instance Account.


 * June 17th to June 23rd (Week 1) - Enabling the .wav support to the TMH extension.


 * June 24th to July 7th (Week 2,3) - Implementing the Record API using HTML5, this is a bit tricky and will be time consuming.


 * July 8th to July 14th (Week 4) - Finishing the Record API and will also be solving browser compatibility issues.


 * July 15th to July 28th (Week 5,6) - Work on the UI implementation.


 * July 29th to August 11th (Week 7,8) - Work on the backend and storing the recordings in commons.


 * August 12th to August 18th (Week 9) - Customizing the style of the UI implementation via CSS classes for various skins available i.e Vector, Monobook, etc.


 * August 19th to September 8th (Week 10,11,12) - Testing time, fix bugs and improve the documentation and the UI.


 * September 9th to September 22nd (Week 14,15) - Buffer Period. In case i fail to make it up to the schedule and also improve the documentation and scrub the code otherwise.

Browser Compatibility

 * I plan on recording the pronunciation using webRTC+Web Audio API supported by HTML5, so browser compatibility is a minor issue at the moment.
 * I have had conversations with developers from Firefox and Chrome, they told me how webRTC has exploded into the scene and since their product release cycles are fast and since this tool will take about 6 months to get fully deployed , i see no issues with browser compatibility then.

Google Chrome

 * Right Now Chrome m27 (Beta channel), Chrome m28 (Dev Channel) supports audio recording through microphone.
 * Google Chrome canary has been supporting audio recording through microphone since m23, by enabling flag "Web Audio Input" via "chrome://flags".
 * Chrome Development Calender

Firefox

 * Firefox v20 has a small bug with audio recording and is expected to be fixed soon.


 * Firefox Release Dates

Internet Explorer

 * WebRTC support for Internet Explorer has been tested on Chrome Frame for Internet Explorer users in non-metro mode.

Benefits

 * When Audio Recording will be fully supported by all browsers, Wikimedia foundation will have a tool to record pronunciation's by then.
 * Will make life easier for a lot of students ,who are a little weak at basic linguistics.

Feedback and Discussion

 * I am very happy to see the response from various Wikimedia Foundation communities to my project.Thanks a lot and keep the feedback and suggestions pouring in
 * I would like to thank Matthew Flaschen ,Quim Gil and above all to my mentor Michael Dale.