Extension:Wikispeech/Adding new voice or language

From mediawiki.org

Wikispeech is as of writing this article still under development. Introducing new voices and languages is not an easy thing to do. Nor is this a guide on how to do that, but we'll try to explain a bit and hope that this will give you an insight as to what might be required.

Adding and enabling voice in Speechoid

Speechoid is the text-to-speech (TTS) backend of Wikispeech. It consists of a number of services that are controlled by the so called Wikispeech server. In order to fully introduce a new language to Speechoid you'll need to both introduce pronunciation lexicon for that language and a voice synthesizer for that language and register them with the Wikispeech server.

Pronunciation lexicon support for a language is not required but can greatly improve synthesized voices.

Wikispeech Speech Data Collector is an upcoming project aimed at recording speech from manuscripts, data that will be fed to a TTS backend that will be able to compile a new voice for any given language. This data also opens up for future implementations of speech-to-text backends.

If there is no prior support for the TTS engine you wish to connect to Speechoid, then you'll also need to implement a new adapter for the Wikispeech server. Adapters normalize the input and output between the TTS engine and the pre- and post processing within Wikispeech server.

Furthermore you might have to implement your own text tokenizer for the Wikispeech extension if the written language you want to support is very different from those that Wikispeech already support. For example, Thai might become a bit problematic to cut down in segments due to the lack of punctuation.

The easiest way to introduce a new language or voice to Speechoid is currently to use the existing Mary TTS components within Speechoid, as there already is an adapter available. Take a look at the Wikispeech server configuration to understand how to add a language.

For example, to add Norwegian bokmål you'd have to add something along the way of:

{
	"textprocessor_configs": [
		{
			"name": "test_textproc_nb",
			"lang": "nb",
			"components": [
				{
					"module": "adapters.marytts_adapter",
					"call": "marytts_preproc",
					"mapper": {
						"from": "nb-no_ws-sampa",
						"to": "nb-no_sampa_mary"
					}
				},
				{
					"module": "adapters.lexicon_client",
					"call": "lexLookup",
					"lexicon": "wikispeech_lexserver_demo:nb"
				}
			]
		}
	],
	"voice_configs": [
		{
			"lang": "nb",
			"name": "stts_no_nst-hsmm",
			"engine": "marytts",
			"adapter": "adapters.marytts_adapter",
			"marytts_locale": "no",
			"mapper": {
				"from": "nb-no_ws-sampa",
				"to": "nb-no_sampa_mary"
			}
		}
	]
}

Configure voice and language in Wikispeech

The extension configuration file needs to be aware of the language and voice. They are added in the WikispeechVoices section. The value of this parameter is an object with language as key and an array of named voices as the value.

For instance, to add a voice to the Norwegian bokmål to the Wikispeech extension you'll need to add something along the way of:

...
  "WikispeechVoices": {
    "description": "Registered voices per language.",
    "value": {
      "nb": [ "stts_no_nst-hsmm" ],
...
  "DefaultUserOptions": {
    "wikispeechVoiceNb": "",
...

Go to the Wikispeech section of the Special:Preferences page of your wiki to make sure that the language and voice has been made available.