Extension talk:PronunciationRecording/GSoC 2013/Proposal

Thoughts
This looks like a good proposal. I had a couple thoughts. I'm not sure the CAPTCHA is necessary. I doubt that many people will go to the trouble of recording bogus pronunciations. I think it's simpler to have numeric codes. The first has no code or just -1. The second has -2, etc. Superm401 - Talk 20:25, 5 April 2013 (UTC)


 * Sure ill be removing the captcha thing,seems redundant to me now and the small problem i feel is that using webRTC the the pronunciation file that will be downloaded is in the .wav format.So in order to upload it to wikimedia commons i need to convert it into .ogg,i am little stuck on this conversion thing .Thanks for replying--Rahul21 (talk) 14:09, 8 April 2013 (UTC)
 * If you happened to use a CAPTCHA you should just integrate it via Extension:ConfirmEdit. --Nemo 07:55, 10 April 2013 (UTC)

Noise filtering, and trim start/end silence
Perhaps as a later phase of this project, optional noise filtering would be good. I always use the Noise Reduction command in GoldWave after recording a pronunciation, and it makes it a lot clearer. Also are you going to trim any periods of silence at the start and end? 86.164.200.121 15:47, 8 April 2013 (UTC)


 * Yes its a good feature that can be implemented as a spin off extension.Suggestion Noted --Rahul21 (talk) 08:03, 9 April 2013 (UTC)
 * Maybe it should be tested to make sure it works with every language. I am not familiar with many of them, but perhaps the sound profile of some consonants in some languages could be damaged by noise removal… I suppose this doesn’t really matter for a start – at worst it could be made optional… --Eiku (talk) 21:30, 12 April 2013 (UTC)

List
If I were to record pronunciations, I would record a list of words all at once. E.g., the user provides a list of words (or phrases), and the recorder then asks to record each item one after the other (the pronunciations can be saved progressively or at the end, I don't know what is the best option). This would be much easier than recording every word individually. To give a concrete example, I don't want to record all the forms of this verb separately (although they will end up in different files). Same for categories and other lists. Darkdadaah (talk) 18:07, 8 April 2013 (UTC)


 * Mass Uploading of words is actually i will be implementing if time permits for GSOC, if not i will be implementing it afterwards--Rahul21 (talk) 06:40, 12 April 2013 (UTC)


 * Ooh, I agree. For someone looking for a small way to contribute, a list of five or ten words in his or her native language would be a great, bite-sized task. – GorillaWarfare (talk) 12:02, 15 April 2013 (UTC)

Translations
Do not forget to make the tool so that it can be translated and used in other languages (cf Translatewiki.net). Darkdadaah (talk) 18:09, 8 April 2013 (UTC)


 * Translations wont be an issue ,since the recorder will be specifying the which language is he recording in Example:If the User is recording "English(US)" then it will be specified in the metadata--Rahul21 (talk) 08:09, 9 April 2013 (UTC)
 * He means localisation of the extension interface. --Nemo 07:55, 10 April 2013 (UTC)
 * Yes. The interface should be translatable so that the tool can be used in other projects than en.wikt. Darkdadaah (talk) 11:52, 10 April 2013 (UTC)

Words with different pronunciations
Do not forget also that in one language a word can be pronounced differently. E.g. en:read in English. The name of the file should reflect that. Darkdadaah (talk) 18:13, 8 April 2013 (UTC)


 * For such words i plan to save them like en-read-1.ogg ,en-read-2.ogg,etc.So under the umbrella of a word,same words with different pronunciations can be saved
 * If the metadata collected will allow it, may I suggest a naming convention of word-language_code[-dialect_or_region]-0.ogg ? This would allow faster searches, plus allow the possibility of automated dialect templating. - Amgine (talk) 15:57, 16 April 2013 (UTC)
 * More like languagecode-word-0[-dialect_or_region].ogg, e.g. en-foo-2-US.ogg and en-foo-2-UK.ogg would be the pronunciation of the same word identified as "en-foo-2", different from "en-foo-1". Darkdadaah (talk) 19:30, 16 April 2013 (UTC)

getUserMedia audio works?
Does getUserMedia for audio actually work already? I've not been able to get it to receive any audio on any browser. None of the demos have worked for me. Is this just on my end? --Yair rand (talk) 21:47, 8 April 2013 (UTC)


 * getUserMedia has very low browser support as off now,but since the extension will take 6-7 months to get deployed ,its highly possible that by that time the compatibilty issues are solved --Rahul21 (talk) 08:06, 9 April 2013 (UTC)


 * [https://code.google.com/p/chromium/issues/detail?id=112367#c178]--Rahul21 (talk) 08:13, 9 April 2013 (UTC)
 * Difficulty with browser support would be my only objection, but I hope it will be lifted soon. Your project sounds like a great improvement for Wiktionaries. --Eiku (talk)

wikt:WT:EDIT
wikt:WT:EDIT is a framework used on the English Wiktionary for conducting edits outside the edit screen. If the tool is also going to have a feature to add the uploaded audio to the entry immediately, I suggest using this, to simplify the workflow on the user side. --Yair rand (talk) 21:53, 8 April 2013 (UTC)


 * I havent thought of this ,will take a look and get back to you ,Thank You
 * Note that this is only used on en.wikt. Don't make it dependent on that, otherwise the other projects will not be able to use this tool. Darkdadaah (talk) 11:51, 10 April 2013 (UTC)
 * The extension can include its own JavaScript which edits the page (to include the media markup) as it sees fit (which will probably involve calling the API). Superm401 - Talk 00:17, 12 April 2013 (UTC)

immediate feedback
As I understand the proposal (and as would be necessary IMO), the submitted file would not be added to the Wiktionary entry. The submitter may be frustrated by that. Immediate feedback would be necessary in the entry indicating the file's been submitted. Preferably, the "record this word" prompt should be replaced by a "oops, recorded that wrong! delete and try again" prompt.—msh210@enwikt 04:24, 10 April 2013 (UTC)


 * In mediawiki when we save an article we get a message "Your edit was saved" .If the file was succesfully not uploaded perphaps we could use a messae "Upload not succesfull".Kudos--Rahul21 (talk) 09:55, 10 April 2013 (UTC)

Choosing language and way to include the tool in pages
I hear that an idea is making the user input in what language the word is being recorded. Ideally, this would not be needed, and the tool would already know it: to do this, however, the language should be fetched from somewhere, and it's not trivial because entries can have many languages, see e.g. go (and any language can have different words written in same way but pronounced differently, i.e. heteronyms). A solution may be: show the tool where called with an explicit function with the required info, e.g. for English which defaults to PAGENAME. This would also make the tool opt-in and less controversial; e.g. en.wikt would add this call to templates such as en-verb if they want it, and so on. --Nemo 07:55, 10 April 2013 (UTC)
 * On heteronyms: if the function also managed to change its effect after recording, i.e. to (stop offering recording and?) embed the recorded file on the page, the recording would just have an incremental numering, but the tool would embed the correct audio file in the correct section of the page (the same one it came from). --Nemo 08:29, 10 April 2013 (UTC)
 * On the same section of the page, how will the script know which section number it started from? and, since stateless, there's the potential for collisions with edit conflicts. Low risk, perhaps; could be addressed after the PoC. - Amgine (talk) 16:02, 16 April 2013 (UTC)

Feasibility of encoding or uploading as .WAV
I really like this idea but many recorders will output formats unsuitable to upload to Commons. You could: Now 3 seems like the easiest technically but the most difficult politically. Perhaps the conversion could be done on the server before the file is 'uploaded', then you would not need to convince the Commons community of very much, as the site would not be hosting a proprietary format. It may be the case that .wav is a less objectionable format than people think, and Commons may find it in its heart to treat it as it does .gif? Let me know what you think, I will pitch something on the Commons VP and see which way the wind is blowing Moogsi (talk) 00:21, 11 April 2013 (UTC)
 * 1) Find another solution outputs .oga
 * 2) Perform the format conversion on the user's side (firefogg? media.io?)
 * 3) Convince Commons to accept .wav uploads (this goes against the philosophy of the site (i.e. no proprietary file formats)

We have extension's like TimedMediaHandler which work on server side transcoding
 * Well I had a talk with many people regarding this .Since this extension will take 6-7 months to get deployed, by the time we will be trying to convince commons to grant us permission for .wav. If we dont get permission by the end, we will resort to server side transcoding
 * Why do we need .wav exactly? Are ogg files not good enough? And if we really need lossless formats, is there no better format (e.g. flac)? Darkdadaah (talk) 08:18, 11 April 2013 (UTC)
 * Moogsi, what makes you say WAV is proprietary? Yes, it was developed by Microsoft and IBM, but as far as I can tell, there are no patents stopping free software from using it. See e.g. .  Superm401 - Talk 00:23, 12 April 2013 (UTC)
 * I'm not sure who convinced me there may be a policy objection to it, as when it was first pitched to me in IRC I wasn't even aware there was a problem with the format... as there are no technical barriers, that would be the only thing stopping it, really. That is why I suggest getting people's opinions at Commons to see if there's something else I don't know --Moogsi (talk) 14:15, 12 April 2013 (UTC)


 * Moogsi, Browsers these days always prefer .wav, 5 seconds of .wav and 5 seconds of .ogg there will be an approximate difference of about 200 KB(WORST case).So i see no harm in the commons community accepting .wav--Rahul21 (talk) 06:34, 12 April 2013 (UTC)
 * Are we talking about the format of the file as it will be saved in Commons? Because in this case a recorded sound weighing 50kB in ogg is much better than a 200 kB wav (with no obvious quality difference; we're recording voice, not HiFi music). Think about the people with limited bandwith. Darkdadaah (talk) 20:01, 12 April 2013 (UTC)
 * the idea would be to auto convert to ogg for download using TMH in a similar way as how ogg theora are transcoded to webm on download depending on your browser. Bawolff (talk) 15:02, 13 April 2013 (UTC)


 * yes exactly :)and we are also trying to convince the commons community for .wav support which is tough--Rahul21 (talk) 17:38, 13 April 2013 (UTC)

getUserMedia popularity
You write that "Firefox v20 has a small bug with audio recording and is expected to be fixed soon." In general I'd appreciate specific links to bug reports (in this case I assume somewhere in bugzilla.mozilla.org) where progress can be followed. Provide sources for statements, please. :) --AKlapper (WMF) (talk) 13:19, 13 April 2013 (UTC)


 * https://bugzilla.mozilla.org/show_bug.cgi?id=803414 --Rahul21 (talk) 17:33, 13 April 2013 (UTC)

Recording link
Your mockup has the link sitting in the page content, in a list item. I think people might mistake it for actual page content and try to find its source, which is problematic because it has no source!

I'd suggest sticking it next to the section edit link for the pronunciation section. Alternatively, if there's actually a pronunciation object somewhere in the code - to me it looks like it might be &lt;span class="IPA"&gt; - you might append a "record pronunciation" link to the end of that, maybe in parentheses, so it's more closely tied to the actual thing you're modifying.

In the same class of suggestion, you should maybe look towards putting this tool into VisualEditor, either first or concurrently with the, so when the time comes to edit pronunciations visually, we can do it aurally too :) --MarkTraceur (talk) 16:34, 13 April 2013 (UTC)


 * And maybe you should add an option to the recording *after it's been added* so someone can upload a new version. Mark the old one as being unused (unless it got linked elsewhere) and put the new one in its place. If that makes sense, you could probably add it to step four of your mockup. Also, maybe name those mockup files something a little less generic :) --MarkTraceur (talk) 16:45, 13 April 2013 (UTC)


 * I'll specify that the link is what i plan for ,thanks for bringing that to my notice . Sticking it next to the section edit link, umm ill have to look into that --Rahul21 (talk) 17:44, 13 April 2013 (UTC)

GSoC project template
fyi Mentorship_programs/Application_template.--Qgil (talk) 17:46, 15 April 2013 (UTC)


 * thanks for bringing that to my notice, will be adopted in v2.1 soon :)--Rahul21 (talk) 18:39, 15 April 2013 (UTC)

Spelling and grammar
Please learn how to use commas, parentheses, colons, apostrophes and similar punctuation. In particular, you often place spaces wrongly. Sharihareswara (WMF) (talk) 19:28, 16 April 2013 (UTC)

Internet Explorer
You wrote:


 * I have had conversations with developers from Firefox and Chrome, they told me how webRTC has exploded into the scene and since their product release cycles are fast and since this tool will take about 6 months to get fully deployed , i see no issues with browser compatibility then.

What about Internet Explorer? Sharihareswara (WMF) (talk) 19:30, 16 April 2013 (UTC)


 * After a bit of research i found out that for Internet Explorer the best option would be to use Chrome Frame(Google Chrome Frame is a free plug-in for Internet Explorer). Rahul21 (talk) 20:04, 17 April 2013 (UTC)
 * So, in the end, IE users will not be able to use this feature natively. From a user point of view, installing Google Chrome or Firefox would be better than installing a plug-in in Internet Explorer (easier and more secure). But it may already be too much to ask. Darkdadaah (talk) 08:34, 18 April 2013 (UTC)

Visual representation
It is important that the "box" blends in with the wiki. The style of the box and its content should be customizable (css classes). Also, do not forget to write the word itself in the box (surely more important than the IPA)! Darkdadaah (talk) 18:48, 27 April 2013 (UTC)

-You mean for different css skins i.e vector monobook etc?I actually was in a dilemma to add the word or not?Since the user will be be on the Wiktionary page of the word, he ought to know it--59.97.213.175 04:19, 28 April 2013 (UTC)
 * I mean that there should be a default style (not one for every skin). Ideally it should be usable in at least vector, and maybe monobook (these are the two most used styles, and the first one is the default).
 * If a user is creating pronunciations for several words one after the other, it is always good to have the word he wants to pronounce right in front of of him. It is bad if the user asks himself: "what was the word I wanted to record again?" and has to scan the page (which is partially hidden by the box) to find the word described in the page. Darkdadaah (talk) 11:46, 28 April 2013 (UTC)

Feedback on app from Amgine
- Amgine (talk) 21:15, 29 April 2013 (UTC)
 * User:Rahul21/Gsoc: Words are pronounced differently in different dialects, which may be regional or ethnic.
 * Differing pronunciations may be described using phonetic representations (such as IPA), but are more readily understood when native speakers record the word as they speak it naturally.
 * Here is my re-write of point 4:
 * Heteronyms such as 'minute' are two or more words which are spelt the same but have distinctly different meanings, and are made clear in the spoken language: very small (IPA: /maɪˈn(j)ut/) is distinct from a sixty-second measurement of time (IPA: /ˈmɪnɪt/).
 * User:Rahul21/Gsoc: The primary benefit is laying the groundwork for contributor-created audio to mediawiki sites in any current browser. Secondary benefit is to support the Wiktionary project's goals of being an education resource - especially for learners of a second language. Further benefits include creating a framework on which language learning can be based, collecting a corpus of tagged linguistic recordings suitable for research, and tools for creating oral histories.
 * User:Rahul21/Gsoc: I believe MDale and MFlaschen will want to have regularly scheduled progress meetings, hopefully via a tool like g+ hangouts or skype so you can get synchronous feedback on your project's development.
 * User:Rahul21/Gsoc: I would really like to see specific points at which you freeze features and focus on completion of a phase. Also, your specification is loose, but may be a bit rich (that is, it's potentially a lot of work); just an opinion.

Thanks Amgine for your suggestions, however I would like you to shed more light on your suggestion regarding the timeline--Rahul21 (talk) 12:01, 30 April 2013 (UTC)