MikroElektronika Learn

Click Boards

Make Your Robot Speak – Text to Speech

28 Apr , 2016  

Speech synthesis – production of the human voice from non human objects. It’s one of the more popular developments through computer history is the reproduction of the human sound. The basic idea starts in 18th century when Russian professor Christian Kratzenstein created an apparatus based on the human vocal tract to demonstrate the physiological differences involved in the production of five long vowel sounds.

Today speech synthesis found usage in some areas of the life. Public transport stations or airports for example uses this technology to generate speech for passengers. This technology also can be used by people that can not speak themselves with the most famous one being Stephen Hawking.

 

photo

 

Introduction

Speech synthesis is part of computer science and there are two main types of implementation – software and hardware. Both of types having the same working logic.

  1. Text normalization by conversion of anything that is not text to an equivalent number representation.
  2. Assigning the phonetic transcriptions to each word, and divides and marks the text into prosodic units like phrases and sentences.
  3. Sound generation.

From the point of the end user that is listening to speech, differences between types of implementation does not matter. From the point of a developer the difference is obvious. Software implementation requires development of a solution that is going to be hosted by a PC. It could have many dependencies on the host OS for example. Today, popular sites are online text to speech. When offering a hardware solution – the TextToSpeech click can be hosted by the MCU and allow the processor on the hardware solution to do the heavy lifting.

text-to-audio-gig2

Text to Speech Click

The TextToSpeech click board that caries the S1V30120 and is a Speech Synthesis IC that provides a solution for adding Text To Speech ( TTS ) and ADPCM speech processing applications to a range of portable devices. The IC is powered by the Fonix DECtalk® v5 speech synthesis engine that can make your robot or portable device talk in US English, Castilian Spanish or Latin American Spanish, in one of nine pre-defined voices.

The S1V30120 contains all the required analogue codecs, memory, and EPSON-supplied embedded algorithms. All applications are controlled over a single SPI allowing control from a wide range of hosts. Our click board supports 3.3 V and 5 V power supplies.

As a cost effective solution the click board doesn’t carry it’s own MCU and because of that you will need to be very careful when selecting a MCU to use. You have to choose the MCU that have at least 45 KB of flash. Reason for that is initialization data that should be uploaded to the click board any we power on the click board.

 

tts

 

DECtalk is the world’s most intelligible TTS synthesizer with the most natural sounding voice. DECtalk has the smallest memory footprint in the industry for a full featured, multi-language voice synthesizer. So it is an excellent embedded solution.

Feel free to explore the links provided in the references, you will find several tutorials on how to make a voice for your robot more natural. There are also ready made songs that uses DECtalk.

Library

In this section we will make a simple example and explain how to use our library. The library covers all functionalities of the S1V30120. Datasheet and documentation is provided could be helpful if you want to explore the functionality of the library and click board. Library packed for our compilers can be found on Libstock.

First of all, like many times before, we want to assign proper pins to our click board. Initialization of the SPI bus for the communication between the click board and MCU and call the execution of tts_init()that must be executed first before any other function.

One more thing that is not a must but could be helpful is assignment of the callbacks to the library. There are three callbacks that can be added to the functionality. First one is a callback that is going to be executed every time we receive message block response. This response is usually received when the system controller identifies that there is insufficient system resource to service the request.

Second and third one and more important because it is an error indication. The errors from the device could be divided to two types – fatal and non-fatal errors . In the case of a fatal error all further messages sent to device will cause the error response again. The best solution if that kind of error occurs is hardware reset of the device.

The device expects the few rules at the booting up the firmware and entering the main mode. For this procedure we will create a function tts_setup()  that will be executed after uploading of the initialization data, and entering the main mode..

After the power on mode device enters the boot mode which can be used to upload the initialization data to the device. Inside this mode first thing you should do is to send firmware version request just to check does device entered the boot mode. If the device responses positive we can start the upload of the initialization data.

Uploading of the data process is executed in transfer sequences. The size of each sequence is the BOOT_MESSAGE_MAX number of bytes. If you haven’t noticed, there is a header file provided with this library named text_to_speech_img.h. That file is actually binary data that will be uploaded. If you MCU have no enough flash space to place this header you can upload the data from the external EEPROM or FLASH. After uploading of the init data we can test the device with the tts_interface_test() this test actually means the confirmation for the upload process and switch to the device to main working mode.

Once we entered the main working mode we can start the configure the device. Library have built in functions for default configuration that we are going to implement for this example.

Alongside this default configuration there are functions that allows you to configure the device in the way you want. The most important is tts_config where you are configuring the most vital parameters of the device like: language, speach rate, voice type and epson parser usage. Audio configuration has its own name for configuring audio parameters. Power and codec configuration have no settable parameters but is recommended to write any needed values inside that particular register.

After the setup we can send the first string to hear our TTS for the first time. This is an example of how to execute a simple TTS conversion so feel free and explore the library.

Example

After successful test we can now implement something more complex than linear application that is just going to speak the text provided before compiling.

The idea is to make the application that will be able to speak the text provided through the serial port. For that kind of job MikroC have tools and built-in functions inside the UART library that could be very helpful. First we have UART Terminal, the tool that will be used for writing the text we want to convert to speech. Second very helpful thing is function UART_Read_Text that have delimiter argument which will help us to stop reading the UART and provide text to the TextToSpeech.

Now you can compile the application and start the terminal by Tools > USART Terminal and connect to the proper USB UART port. Don’t forget to check Append New Line option because our application expects the new line character to stop reading the UART.

 

Summary

This device have one more advantage – there many tools that will help you to make your own dictionary for this click board. You can find it on the manufacturer documentation. With our library and tools you can make your robot speak exactly you wish in very short amount of time. You will have more time to concentrate on building the brain for your HAL 9000, the solution for voice is already here.

References:

Manual for TextToSpeech click 2016

Libstock Libries for TextToSpeech click 2016

Speech synthesis Wikipedia 2016

Christian Gottlieb Kratzenstein Wikipedia 2016

S1V30120 Epson Text to Speech 2016

DECtalk Guide 2016

Songs for DECtalk 2016

Products mentioned

Related Posts

, , , , , ,

By  
Firmware developer in MikroElektronika with a passion for telemetry in the field of IoT. Low level is for the true modern day warriors.



11 Responses

  1. Jean-Luc Deladrière says:

    Excellent !
    Ordering one ! Will you also develop some example for Atmel AVR ?
    An Arduino library would be even better !
    what is the purpose of text_to_speech_img.h / what does it initialize ?
    Thx in advance

    • Milos Vidojevic says:

      Thanks Jean-Luc,

      Library can be used with our compiler for AVR MCUs that have enough memory space. The only difference between AVR examples and examples on the learn page are in SPI bus and pins initialization ( system_init ). Header file text_to_speech_img.h represents the initialization data that must be uploaded to the S1V30120 after every reset ( PDF fourth page ). There is no information about how S1V30120 uses that data.

      Regards,
      Milos

  2. Jean-Luc Deladrière says:

    Hi Milos,
    Thank you for your reply,
    I want to run it on Arduino Zero which has 256KB of flash and 32 KB of RAM so I guess it should fit.
    Great to read that the specifics are ‘only’ the pins initialization and the SPI communication.
    I there any example where I can understand how to translate the SPI communication mikroC style into Arduino style ?
    Cheers,
    Jean-Luc

  3. Milos Vidojevic says:

    Hi Jean-Luc,

    inside the HAL you have to adapt all functions. Also _write_req and _read_rsp should be adapted or removed depend on how you are going do adapt HAL functions. Arduino’s SPI.beginTransaction() is same as tts_hal_cs_low() and SPI.transfer() same as write/read. Also there are MikroC conversion functions in tts_version_boot function for example. You can replace them with sprinti or sprintl.

    Good Luck,
    Milos

  4. Ibrahim says:

    Hello,

    I just bought the Text to Speech Module, how can it interface on EasyPIC v7.
    Would be glad if i can get a clue on this.
    Thanks

    • Ibrahim says:

      Can i use any of clicker 2 for PIC32MX or PIC32MX clicker board with the Text-to-Speech module, will it support it conveniently, I wanted to know before buying the clicker boards.

      Thanks

      • Milos Vidojevic says:

        Of course PIC32MX have enough RAM to work with full buffer size ( 2116 ). Library is improved and now works on all our compilers, even on small MCUs. All you have to do is to change buffer size inside the hw header before compiling. There is more about that inside the help documentation provided.

      • Milos Vidojevic says:

        Yes, you can. It’s tested with PIC32MX, works fine with full buffer size 2116.

    • Milos Vidojevic says:

      Library is improved and now works on all our compilers, even on small MCUs. All you have to do is to change buffer size inside the hw header before compiling. There is more about that inside the help documentation provided.

      • Ibrahim says:

        Can i get a sample code to work with PIC18f45k22 microcontroller?
        i tried downloading the Library for PIC and could not get it to work.

        I will be glad to see a sample code.
        Thanks

        • Milos Vidojevic says:

          P18F45K22 MCUs have only 1.5 kB of RAM space and it is great solution for sensors and other non-complex devices. This module has a robust and large data footprint. Because of this, running a click board of this complexity becomes almost impossible. We have tested it on P18F87K22 which have 3.5kB of RAM space and it works fine so you can try with that one.

Leave a Reply

Your email address will not be published. Required fields are marked *