Extension:Wikispeech/Installing Speechoid

Speechoid is the text-to-speech backend of Wikispeech. It consists of a number of services controlled via Wikispeech-server. The Speechoid build pipeline use Blubber to create Docker images which is ment to be deployed on for instance a Kubernetes cluster, but there is also a Compose project ment for running locally, on development servers and small installations.

Please report any errors you encounter in the discussion of this wikipage.

= Prerequisites =

This guide is based on the setup used by the Wikispeech development team that use Ubuntu (18, 19 and 20) as workstations and Debian (10) server side.

Docker
https://docs.docker.com/engine/install/

Docker compose
https://docs.docker.com/compose/install/

Blubber
https://wikitech.wikimedia.org/wiki/Blubber/Download We strongly recommend using the prebuilt binary.

Make sure it's available in your $PATH, e.g. by copying it to /usr/local/bin.

= Building services =

Images are automatically built by WMF Jenkins and deployed to the WMF Docker registry. The docker compose-project will pull images from here. If you want to build your own images you'll have to update the image locations in the docker-compose.yml-file.

Using the build script
The docker-compose project comes with a script that will download and build all services for you.

git clone https://github.com/Wikimedia-Sverige/wikispeech-speechoid-docker-compose.git cd wikispeech-docker-compose ./create-all-images.sh

Manually
Each service lives in its own git repository at gerrit.wikimedia.org:

git clone "https://gerrit.wikimedia.org/r/mediawiki/services/wikispeech/mary-tts" git clone "https://gerrit.wikimedia.org/r/mediawiki/services/wikispeech/mishkal" git clone "https://gerrit.wikimedia.org/r/mediawiki/services/wikispeech/pronlex" git clone "https://gerrit.wikimedia.org/r/mediawiki/services/wikispeech/symbolset" git clone "https://gerrit.wikimedia.org/r/mediawiki/services/wikispeech/wikispeech-server"

Each service contains a Blubber-helper script that prepare the image.

cd some-service ./blubber-build.sh

= Starting Speechoid =

Using compose
The easiest way is to spin it up using docker-compose:

git clone https://github.com/karlwettin/wikispeech-docker-compose.git cd wikispeech-docker-compose ./run.sh

After a little while Wikispeech server should be available as an HTTP service on port 10000.

Manually
As of writing this documentation, there is no ready to go pipeline for starting up in other environments. Speechoid will, when ready for production, be deployed on a Kubernetes cluster. Expect documentation about that here.

= Setting up on WMF Cloud VPS =

Also see Instructions for preparing your Cloud VPS instance.

Clone and prepare Speechoid docker-compose
sudo su mkdir /etc/docker-compose cd /etc/docker-compose git clone https://github.com/Wikimedia-Sverige/wikispeech-speechoid-docker-compose.git speechoid cd speechoid mkdir volumes mkdir volumes/wikispeech_mockup_tmp chmod a+rwx volumes/wikispeech_mockup_tmp

Setup as systemd service
The following will automatically start Speechoid on boot.

sudo su cat << EOF > /etc/systemd/system/docker-compose\@.service [Unit] Description=docker-compose %i service Requires=docker.service network-online.target After=docker.service network-online.target

[Service] WorkingDirectory=/etc/docker-compose/%i Type=simple TimeoutStartSec=15min Restart=always ExecStart=/usr/bin/docker-compose up ExecStop=/usr/bin/docker-compose down

[Install] WantedBy=multi-user.target EOF

systemctl enable docker-compose@speechoid systemctl start docker-compose@speechoid

Setup web proxy
You'll now need to setup an HTTP proxy pass to Speechoid on port 10000. Alternatively you could setup a floating IP, but that isn't covered by this guide.

Go to https://horizon.wikimedia.org/project/proxy/ and click the Create Proxy-button. Fill in the information about your VPS and use backend port 10000. Speechoid should now be publicly available as HTTP and HTTPS on port 80 and 443.

= Pronlex on MariaDB =

If you don't have a MariaDB installed:

sudo apt -y update sudo apt -y install software-properties-common gnupg2 sudo apt -y upgrade sudo reboot

Setup your your favorite mirror: (I used Kenya for no reason. You might want something more local or more trusted.)

sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xF1656F24C74CD1D8 sudo add-apt-repository 'deb [arch=amd64] http://mariadb.mirror.liquidtelecom.com/repo/10.4/debian buster main' sudo apt update

Install MariaDB.

sudo apt install mariadb-server mariadb-client

Secure, but don't set a root password. Pronlex expects this (?! really ?!) and will create a user and databases further down.

sudo mysql_secure_installation

Install and setup GoLang.

sudo apt install build-essential gcc cd ~ mkdir opt wget https://dl.google.com/go/go1.13.linux-amd64.tar.gz -O /tmp/go1.13.linux-amd64.tar.gz cd opt tar xvfz /tmp/go1.13.linux-amd64.tar.gz export GOROOT=~/opt/go export GOPATH=~/opt/goProjects export PATH=${GOPATH}/bin:${GOROOT}/bin:${PATH}

Install and build Pronlex.

cd ~ git clone https://github.com/wikimedia/mediawiki-services-wikispeech-pronlex.git cd ~/mediawiki-services-wikispeech-pronlex go build ./...

Populate MariaDB with Lexdata.

cd ~ git clone https://github.com/stts-se/wikispeech-lexdata.git /bin/bash scripts/import.sh -a ~/appdir -e mariadb -l 'speechoid:@tcp(127.0.0.1:3306)' -f ~/wikispeech-lexdata

Profit!

(You might want to set a password for the MariaDB user speechoid at this point)

(You might want to create a user for remote access for the MariaDB user speechoid using the same password and grants at this point)

(You can down delete all GoLang, Pronlex and Lexdata folders if you want.)

Start the Prolex in your Speechoid pointing it at this MariaDB. Something like:

/bin/bash scripts/start_server.sh -a /srv/appdir -e mariadb -l 'speechoid:password@tcp(wikispeech-tts-pronlex:3306)'