Text-to-speech: Turtle, Microsoft’s Vall-E alternative

The turtle lands an AI model that takes up to a minute to produce a single-sentence conclusion. If it seems slow at first glance, how effective and accessible it remains, it is not in the AI ​​model of Vall-E synthesized sound from Microsoft.

Since Le Monde Inforatique started covering the rise of various AI applications, such as image generation – especially with Stable Horde – code repositories on GitHub and links on Reddit. Some of them are actually found on commercial sites that develop their own algorithms or adapt others published as open source. A great example of an existing audio AI site is Uberduck.ai, which offers hundreds of pre-programmed templates. Simply enter text into the field for Elon Musk, Bill Gates, Daffy Duck or even Siri to read pre-programmed lines.

You need to upload clear voice samples to train the AI ​​to repeat speech. Artificial intelligence learns how a speaker combines sounds with the goal of learning these relationships, improving them, and imitating the results. Normally, putting together a good voice model can take some practice with long samples to show how a particular person speaks. But in recent days, something new has emerged: Microsoft Vall-E, enriched with research work (with real-life examples) on synthesized sound that requires only a few seconds of source sound to create a fully programmable sound. Naturally, researchers and other AI fans wanted to know if the Vall-E model would ever be released to the public. The answer is no. Meanwhile, if you want, it is possible to play with another model called Turtle. (The author specifies that it is called Turtle because it is slow, which is true, but it works).

Overview of VALL-E. (Credit: VALL-E / Microsoft)

Train your own AI voice with Turtle

What makes Turtle interesting is that anyone can train the model with any voice they want just by downloading a few audio clips. The GitHub page for the solution says it takes several clips of about ten seconds each. They should then be saved in a .WAV file with a specific quality. How does this work? Thanks to an unknown cloud service: Google Colab (or “Collaboration”). It lets you write and run Python code in your browser with no configuration required, free access to GPUs, and easy sharing. The code you (or someone else) writes can be saved in a notebook and later shared with users with a shared Google account. Here is the turtle shared resource.

The interface looks intimidating, but it’s not that bad. You need to sign in as a Google user, then click the “Connect” button in the upper right corner. If this module doesn’t upload anything to your Google Drive, other modules might. Note that the audio files it creates, on the other hand, are stored in the browser, but can be downloaded to your computer. An important small detail: if someone executes code written by someone else, it’s possible that the user will get error messages either because of bad input or because of a background problem, such as Google not having a GPU. It’s all a bit experimental.

Turtle Collaboration. Click “Connect” to get started, then click the little “play” icon next to each code block in turn. (Credit: Mark Hachman/IDG)

Each block of code has a small “game” icon that appears if you hover your mouse over it. You’ll need to click play on each block of code to execute it, waiting for each block to execute before moving on to execute the next block.

(Credit: Collaboration)

Without going into detail, note that the red text can be changed by the user, for example the suggested text we want the model to pronounce. After about seven blocks, the user will have the option to train the model, name it, and then upload the audio files. After doing this, select the audio template in the fourth block, run the code and configure the text in the third block. Finally, this block of code must be executed. If everything goes as expected, the result is a small audio output of his voice sample. It works quite well, even larger than life.

Leave a Reply

Your email address will not be published. Required fields are marked *