Amazon Polly TTS Engine

TTS Engine Selection

Select the Amazon Polly TTS from the list of engines available in the TTS drop down found in our online Plum Dev environment.

Granular Polly voice tag attribute details are available here.

Polly & Plum DEV Best Practices

Plum Voice is pleased to offer Polly as a TTS option for Plum DEV. However, users should be aware of the nuances of Polly before switching to it from one of the other TTS engines DEV supports. We compiled some best practices and potential pitfalls to keep in mind when changing the TTS engine for your IVR applications.

Polly is much stricter about certain VXML elements than other DEV TTS engines. Therefore, updating an existing application with Polly doesn’t typically involve large, wholesale changes to a call-flow. Instead, this transition likely requires numerous small updates throughout the application.

Plum engineers continue to build out Polly functionality and this documentation will be updated once new features are developed, tested, and deployed.

Using <speak> and <voice> tags

  • Polly supports using individual <prompt> tags, but limits available functionality

  • Polly requires <speak> and <voice> tags in order to use different voices and languages.

  • The acceptable syntax values for the xml:lang property is slightly different in Polly, but it is also very strict.

    • Example: Non-Polly engines use the formaten_us. In Polly, the same language property syntax is en-US. In Polly, the language code is lower-case followed by a dash, which is followed by the dialect code in upper-case.

  • If using multiple languages within the same application it is necessary to write text for each prompt in the desired language.

    • Example: If you have an initial greeting such as "Hello, thank you for calling" and you have a Spanish language option, then the greeting needs to be written in English and in Spanish, e.g. "Hola, gracias por llamar."

Don’t use unsupported tag properties

  • VXML defines “type” as an attribute of the <acronym> tag. Polly doesn’t support this attribute and, if used, returns an error. In the IVR, this results in no audio.

    • To get the same functionality the tag <say as type="amount"> gets replaced with <say as interpret-as="unit">.

Names

Polly doesn't have any special traits for names. Polly reads a person's name as it is input in the code. For example:

<prompt>
    <speak>
      <voice name="Joanna">
        Jones, Davy
      </voice>
    </speak>      
</prompt>

Currency

For currency, use the "unit" type and include the currency symbol with the value. The following example simply places the dollar sign before the '100' value.

<say-as interpret-as="unit">$100</say-as>

Alphanumerics

Polly has some challenges reading back alphanumeric values. For the best results, separate the string into its individual numbers and letters. This is easily done in VXML using Javascript, so that you don't have to calculate a separate variable.

<var name="id" expr='"AG35E"'/>
<block>
  <prompt>
    <speak>
      <voice name="Lupe">
        <value expr="id.split('').join(', ')"/>
      </voice>
    </speak>      
  </prompt>
</block>

In this example, the TTS engine reads back the value of "A, G, 3, 5, E". You may also consider using this in combination with the <prosody> tag to adjust the caller experience. For more information on how Polly handles the prosody tag, see the Polly documentation.

Voice Selection

If the Amazon Polly TTS is specified, and neither a <speak> nor <voice> tag are specified, by default Plum Voice will use the en_US, standard, female voice Joanna.

If another voice is desired, it should specified using the <speak> and <voice> tags as follows within the prompt block:

<?xml version="1.0"?>
<vxml version="2.0">
 <form>
  <block>
   <prompt>
    <speak xml:lang=”es-MX”>
    <voice name="Mia" variant=”1”>
     Hello, thank you for calling Plum Voice.
    </voice>
    </speak>
   </prompt>
  </block>
 </form>
</vxml>

To sequentially use multiple languages and voices within a <prompt> block, use multiple <speak> and <voice> blocks. For example:

<?xml version="1.0"?>
<vxml version="2.0">
 <form>
  <block>
   <prompt>
    <speak xml:lang="en-US">
    <voice name="Joanna" variant="2">
     Press one to continue in English.
    </voice>
    </speak>
    <speak xml:lang="es-US">
    <voice name="Lupe" variant="2">
     Presione dos para continuar en español.
    </voice>
    </speak> 
    <speak xml:lang="fr-FR">
    <voice name="Celine" variant="standard">
     Appuyez sur trois pour continuer en français.
    </voice>
    </speak>
   </prompt>
  </block>
 </form>
</vxml>

Prompt text must be written in the selected language. Polly will not translate English-language prompts to another language.

<speak>

The <speak> tag should be used to specify the desired language through the attribute xml:lang=”lg-CN,where lg-CN is the language-country pair specified in the Language column from that table of supported languages here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.

Please note that each voice has an associated language. Selecting a language that is not associated with the voice will result in unpredictable behavior; however, in many cases, you will hear the language the text was written in accented by that voice’s associated language.

<voice>

The <voice> tag should be used to specify the desired voice through the attribute name=”name”, where name is the voice specified in the Name/ID column for the table of supported voices here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.

Within the <voice> tag the attribute variant="number" should be used to specify whether a standard voice (variant=1) or neural voice (variant=2) is desired. Note that only certain voices support the neural option. See: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html for a list of voices that support neural, and https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html for the differences between standard and neural voices.

Supported SSML Tags

The text to be spoken that is specified within the <voice> block can further be marked up with the following SSML tags that are supported by Plum Voice for Amazon Polly.

Tag

Description

Availability with Neural Voices

<emphasis>

Emphasizing Words

No

<phoneme>

Using Phonetic Pronunciation

Yes

<prosody>

Controlling Volume, Speaking Rate, and Pitch

Partial

<say-as>

Controlling How Special Types of Words Are Spoken

Partial

<sub>

Pronouncing Acronyms and Abbreviations

Yes

For more details on these SSML tags, please visit: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html

Not all tags listed on Amazon's website are currently supported by Plum Voice.

Last updated