User:Sebastian Berlin (WMSE)/tmp

Integration with TTS-server
Main documentation

The extension depends on a server generation of utterances. The main server contains an API for various functions. When preparing an utterance, a request is sent to the TTS server with the utterance as a string and parameters like what language is used. The TTS server processes this and returns the URL to the synthesized audio along with some other information. Tokens with timestamps is one of the return values, which is needed for skipping and highlighting to work.

The extension uses a service for TTS operations, such as creating audio for utterances. This TTS server consists of a main API, a lexicon server, TTS engines and any additional components that may be required for certain languages.

The extension sends requests to the server when an utterances is prepared to be read. The server generates the utterance as audio and responds with the URL to an audio file and other information. This is then used for

The API has support for requesting

Main Wikispeech Server
Repo

API for various operations. Includes endpoint for generating synthesized speech.

Pronlex
Repo

Lexicon server with it's own API. Holds information about lexicon entries and allows manipulation of them. Used internally to look up pronunciation of the words to synthesize.

TTS engines
The server supports having multiple TTS engines. Which one is used depends on which voice is used to synthesize.

MaryTTS
Repo

Comes with support for Arabic, English and Swedish.