Extension talk:PronunciationRecording/GSoC 2013/Proposal

Add topic
From mediawiki.org


This looks like a good proposal. I had a couple thoughts. I'm not sure the CAPTCHA is necessary. I doubt that many people will go to the trouble of recording bogus pronunciations. I think it's simpler to have numeric codes. The first has no code or just -1. The second has -2, etc. Superm401 - Talk 20:25, 5 April 2013 (UTC)Reply

  • Sure ill be removing the captcha thing,seems redundant to me now and the small problem i feel is that using webRTC the the pronunciation file that will be downloaded is in the .wav format.So in order to upload it to wikimedia commons i need to convert it into .ogg,i am little stuck on this conversion thing .Thanks for replying--Rahul21 (talk) 14:09, 8 April 2013 (UTC)Reply

Noise filtering, and trim start/end silence[edit]

Perhaps as a later phase of this project, optional noise filtering would be good. I always use the Noise Reduction command in GoldWave after recording a pronunciation, and it makes it a lot clearer. Also are you going to trim any periods of silence at the start and end? 15:47, 8 April 2013 (UTC)Reply

Maybe it should be tested to make sure it works with every language. I am not familiar with many of them, but perhaps the sound profile of some consonants in some languages could be damaged by noise removal… I suppose this doesn’t really matter for a start – at worst it could be made optional… --Eiku (talk) 21:30, 12 April 2013 (UTC)Reply


If I were to record pronunciations, I would record a list of words all at once. E.g., the user provides a list of words (or phrases), and the recorder then asks to record each item one after the other (the pronunciations can be saved progressively or at the end, I don't know what is the best option). This would be much easier than recording every word individually. To give a concrete example, I don't want to record all the forms of this verb separately (although they will end up in different files). Same for categories and other lists. Darkdadaah (talk) 18:07, 8 April 2013 (UTC)Reply

Ooh, I agree. For someone looking for a small way to contribute, a list of five or ten words in his or her native language would be a great, bite-sized task. GorillaWarfare (talk) 12:02, 15 April 2013 (UTC)Reply


Do not forget to make the tool so that it can be translated and used in other languages (cf Translatewiki.net). Darkdadaah (talk) 18:09, 8 April 2013 (UTC)Reply

Words with different pronunciations[edit]

Do not forget also that in one language a word can be pronounced differently. E.g. en:read in English. The name of the file should reflect that. Darkdadaah (talk) 18:13, 8 April 2013 (UTC)Reply

  • For such words i plan to save them like en-read-1.ogg ,en-read-2.ogg,etc.So under the umbrella of a word,same words with different pronunciations can be saved
    If the metadata collected will allow it, may I suggest a naming convention of word-language_code[-dialect_or_region]-0.ogg ? This would allow faster searches, plus allow the possibility of automated dialect templating. - Amgine (talk) 15:57, 16 April 2013 (UTC)Reply
    More like languagecode-word-0[-dialect_or_region].ogg, e.g. en-foo-2-US.ogg and en-foo-2-UK.ogg would be the pronunciation of the same word identified as "en-foo-2", different from "en-foo-1". Darkdadaah (talk) 19:30, 16 April 2013 (UTC)Reply

getUserMedia audio works?[edit]

Does getUserMedia for audio actually work already? I've not been able to get it to receive any audio on any browser. None of the demos have worked for me. Is this just on my end? --Yair rand (talk) 21:47, 8 April 2013 (UTC)Reply

  • getUserMedia() has very low browser support as off now,but since the extension will take 6-7 months to get deployed ,its highly possible that by that time the compatibilty issues are solved --Rahul21 (talk) 08:06, 9 April 2013 (UTC)Reply
Difficulty with browser support would be my only objection, but I hope it will be lifted soon. Your project sounds like a great improvement for Wiktionaries. --Eiku (talk)


wikt:WT:EDIT is a framework used on the English Wiktionary for conducting edits outside the edit screen. If the tool is also going to have a feature to add the uploaded audio to the entry immediately, I suggest using this, to simplify the workflow on the user side. --Yair rand (talk) 21:53, 8 April 2013 (UTC)Reply

immediate feedback[edit]

As I understand the proposal (and as would be necessary IMO), the submitted file would not be added to the Wiktionary entry. The submitter may be frustrated by that. Immediate feedback would be necessary in the entry indicating the file's been submitted. Preferably, the "record this word" prompt should be replaced by a "oops, recorded that wrong! delete and try again" prompt.—msh210@enwikt 04:24, 10 April 2013 (UTC)Reply

Choosing language and way to include the tool in pages[edit]

I hear that an idea is making the user input in what language the word is being recorded. Ideally, this would not be needed, and the tool would already know it: to do this, however, the language should be fetched from somewhere, and it's not trivial because entries can have many languages, see e.g. wikt:go (and any language can have different words written in same way but pronounced differently, i.e. heteronyms). A solution may be: show the tool where called with an explicit function with the required info, e.g. {{#record:|en}} for English which defaults to PAGENAME. This would also make the tool opt-in and less controversial; e.g. en.wikt would add this call to templates such as wikt:en:en-verb if they want it, and so on. --Nemo 07:55, 10 April 2013 (UTC)Reply

On heteronyms: if the function also managed to change its effect after recording, i.e. to (stop offering recording and?) embed the recorded file on the page, the recording would just have an incremental numering, but the tool would embed the correct audio file in the correct section of the page (the same one it came from). --Nemo 08:29, 10 April 2013 (UTC)Reply
On the same section of the page, how will the script know which section number it started from? and, since stateless, there's the potential for collisions with edit conflicts. Low risk, perhaps; could be addressed after the PoC. - Amgine (talk) 16:02, 16 April 2013 (UTC)Reply

Feasibility of encoding or uploading as .WAV[edit]

I really like this idea but many recorders will output formats unsuitable to upload to Commons. You could:

  1. Find another solution outputs .oga
  2. Perform the format conversion on the user's side (firefogg? media.io?)
  3. Convince Commons to accept .wav uploads (this goes against the philosophy of the site (i.e. no proprietary file formats)

Now 3 seems like the easiest technically but the most difficult politically. Perhaps the conversion could be done on the server before the file is 'uploaded', then you would not need to convince the Commons community of very much, as the site would not be hosting a proprietary format. It may be the case that .wav is a less objectionable format than people think, and Commons may find it in its heart to treat it as it does .gif? Let me know what you think, I will pitch something on the Commons VP and see which way the wind is blowing Moogsi (talk) 00:21, 11 April 2013 (UTC)Reply

  • Well I had a talk with many people regarding this .Since this extension will take 6-7 months to get deployed, by the time we will be trying to convince commons to grant us permission for .wav. If we dont get permission by the end, we will resort to server side transcoding

We have extension's like TimedMediaHandler which work on server side transcoding

Why do we need .wav exactly? Are ogg files not good enough? And if we really need lossless formats, is there no better format (e.g. flac)? Darkdadaah (talk) 08:18, 11 April 2013 (UTC)Reply
Moogsi, what makes you say WAV is proprietary? Yes, it was developed by Microsoft and IBM, but as far as I can tell, there are no patents stopping free software from using it. See e.g. [1]. Superm401 - Talk 00:23, 12 April 2013 (UTC)Reply
I'm not sure who convinced me there may be a policy objection to it, as when it was first pitched to me in IRC I wasn't even aware there was a problem with the format... as there are no technical barriers, that would be the only thing stopping it, really. That is why I suggest getting people's opinions at Commons to see if there's something else I don't know --Moogsi (talk) 14:15, 12 April 2013 (UTC)Reply
Moogsi, Browsers these days always prefer .wav, 5 seconds of .wav and 5 seconds of .ogg there will be an approximate difference of about 200 KB(WORST case).So i see no harm in the commons community accepting .wav--Rahul21 (talk) 06:34, 12 April 2013 (UTC)Reply
Are we talking about the format of the file as it will be saved in Commons? Because in this case a recorded sound weighing 50kB in ogg is much better than a 200 kB wav (with no obvious quality difference; we're recording voice, not HiFi music). Think about the people with limited bandwith. Darkdadaah (talk) 20:01, 12 April 2013 (UTC)Reply
the idea would be to auto convert to ogg for download using TMH in a similar way as how ogg theora are transcoded to webm on download depending on your browser. Bawolff (talk) 15:02, 13 April 2013 (UTC)Reply
yes exactly :)and we are also trying to convince the commons community for .wav support which is tough--Rahul21 (talk) 17:38, 13 April 2013 (UTC)Reply

getUserMedia popularity[edit]

You write that "Firefox v20 has a small bug with audio recording and is expected to be fixed soon." In general I'd appreciate specific links to bug reports (in this case I assume somewhere in bugzilla.mozilla.org) where progress can be followed. Provide sources for statements, please. :) --AKlapper (WMF) (talk) 13:19, 13 April 2013 (UTC)Reply

Recording link[edit]

Your mockup has the link sitting in the page content, in a list item. I think people might mistake it for actual page content and try to find its source, which is problematic because it has no source!

I'd suggest sticking it next to the section edit link for the pronunciation section. Alternatively, if there's actually a pronunciation object somewhere in the code - to me it looks like it might be <span class="IPA"> - you might append a "record pronunciation" link to the end of that, maybe in parentheses, so it's more closely tied to the actual thing you're modifying.

In the same class of suggestion, you should maybe look towards putting this tool into VisualEditor, either first or concurrently with the , so when the time comes to edit pronunciations visually, we can do it aurally too :) --MarkTraceur (talk) 16:34, 13 April 2013 (UTC)Reply

And maybe you should add an option to the recording *after it's been added* so someone can upload a new version. Mark the old one as being unused (unless it got linked elsewhere) and put the new one in its place. If that makes sense, you could probably add it to step four of your mockup. Also, maybe name those mockup files something a little less generic :) --MarkTraceur (talk) 16:45, 13 April 2013 (UTC)Reply
I'll specify that the link is what i plan for ,thanks for bringing that to my notice . Sticking it next to the section edit link, umm ill have to look into that --Rahul21 (talk) 17:44, 13 April 2013 (UTC)Reply

GSoC project template[edit]

fyi Mentorship_programs/Application_template.--Qgil (talk) 17:46, 15 April 2013 (UTC)Reply

Spelling and grammar[edit]

Please learn how to use commas, parentheses, colons, apostrophes and similar punctuation. In particular, you often place spaces wrongly. Sharihareswara (WMF) (talk) 19:28, 16 April 2013 (UTC)Reply

Internet Explorer[edit]

You wrote:

I have had conversations with developers from Firefox and Chrome , they told me how webRTC has exploded into the scene and since their product release cycles are fast and since this tool will take about 6 months to get fully deployed , i see no issues with browser compatibility then.

What about Internet Explorer? Sharihareswara (WMF) (talk) 19:30, 16 April 2013 (UTC)Reply

  • After a bit of research i found out that for Internet Explorer the best option would be to use Chrome Frame(Google Chrome Frame is a free plug-in for Internet Explorer). Rahul21 (talk) 20:04, 17 April 2013 (UTC)Reply
    So, in the end, IE users will not be able to use this feature natively. From a user point of view, installing Google Chrome or Firefox would be better than installing a plug-in in Internet Explorer (easier and more secure). But it may already be too much to ask. Darkdadaah (talk) 08:34, 18 April 2013 (UTC)Reply

Visual representation[edit]

It is important that the "box" blends in with the wiki. The style of the box and its content should be customizable (css classes). Also, do not forget to write the word itself in the box (surely more important than the IPA)! Darkdadaah (talk) 18:48, 27 April 2013 (UTC)Reply

-You mean for different css skins i.e vector monobook etc?I actually was in a dilemma to add the word or not?Since the user will be be on the Wiktionary page of the word, he ought to know it-- 04:19, 28 April 2013 (UTC)Reply

I mean that there should be a default style (not one for every skin). Ideally it should be usable in at least vector, and maybe monobook (these are the two most used styles, and the first one is the default).
If a user is creating pronunciations for several words one after the other, it is always good to have the word he wants to pronounce right in front of of him. It is bad if the user asks himself: "what was the word I wanted to record again?" and has to scan the page (which is partially hidden by the box) to find the word described in the page. Darkdadaah (talk) 11:46, 28 April 2013 (UTC)Reply

Feedback on app from Amgine[edit]

  • User:Rahul21/Gsoc#Introduction: Words are pronounced differently in different dialects, which may be regional or ethnic.
    • Differing pronunciations may be described using phonetic representations (such as IPA), but are more readily understood when native speakers record the word as they speak it naturally.
    • Here is my re-write of point 4:
      Heteronyms such as 'minute' are two or more words which are spelt the same but have distinctly different meanings, and are made clear in the spoken language: very small (IPA: /maɪˈn(j)ut/) is distinct from a sixty-second measurement of time (IPA: /ˈmɪnɪt/).
  • User:Rahul21/Gsoc#Benefits: The primary benefit is laying the groundwork for contributor-created audio to mediawiki sites in any current browser. Secondary benefit is to support the Wiktionary project's goals of being an education resource - especially for learners of a second language. Further benefits include creating a framework on which language learning can be based, collecting a corpus of tagged linguistic recordings suitable for research, and tools for creating oral histories.
  • User:Rahul21/Gsoc#Participation: I believe MDale and MFlaschen will want to have regularly scheduled progress meetings, hopefully via a tool like g+ hangouts or skype so you can get synchronous feedback on your project's development.
  • User:Rahul21/Gsoc#Project Schedule aka Timeline: I would really like to see specific points at which you freeze features and focus on completion of a phase. Also, your specification is loose, but may be a bit rich (that is, it's potentially a lot of work); just an opinion.

- Amgine (talk) 21:15, 29 April 2013 (UTC)Reply

Thanks Amgine for your suggestions, however I would like you to shed more light on your suggestion regarding the timeline--Rahul21 (talk) 12:01, 30 April 2013 (UTC)Reply

When working on code usually one is fixing bugs, or one is adding features to the software. It is often useful to set a specific date when you will stop enhancing the software and focus on the quality assurance phase; this is sometimes called feature freeze, because you are no longer adding features. At least for a short while. - Amgine (talk) 16:07, 30 April 2013 (UTC)Reply

Time for code review and deployment[edit]

You may want to explicitly put some more time (two or three weeks?) at the end for code review and deployment. This will require tweaking some of the prior weeks of course. For parts where you are modifying an existing extension (i.e. TimedMediaHandler) it's best to do the code review as you go (you should explicitly mention this, though) rather than one big chunk. But for your own extension, if you want to try to get it deployed by the end of the summer, you should specifically add time for this. Even if you have been using Gerrit for your own extension the whole time, to actually get it deployed, there will be additional reviews.

See this post, particularly "I think smaller scoping would have helped, so in the future, we should simply not accept proposals that do not allot at least a third of the summer to code review and response to code review." Superm401 - Talk 04:15, 3 May 2013 (UTC)Reply

Dutch pronounciation project[edit]

An impressive ongoing project: [2]. --Nemo 22:14, 29 October 2013 (UTC)Reply