Amazon Polly TTS Engine
TTS Engine Selection
Select the Amazon Polly TTS from the list of engines available in the TTS drop down found in our online Plum Dev environment.
Granular Polly voice tag attribute details are available here.
Polly & Plum DEV Best Practices
Plum Voice is pleased to offer Polly as a TTS option for Plum DEV. However, users should be aware of the nuances of Polly before switching to it from one of the other TTS engines DEV supports. We compiled some best practices and potential pitfalls to keep in mind when changing the TTS engine for your IVR applications.
Polly is much stricter about certain VXML elements than other DEV TTS engines. Therefore, updating an existing application with Polly doesn’t typically involve large, wholesale changes to a call-flow. Instead, this transition likely requires numerous small updates throughout the application.
Plum engineers continue to build out Polly functionality and this documentation will be updated once new features are developed, tested, and deployed.
Using <speak> and <voice> tags
Polly supports using individual
<prompt>
tags, but limits available functionalityPolly requires
<speak>
and<voice>
tags in order to use different voices and languages.The acceptable syntax values for the
xml:lang
property is slightly different in Polly, but it is also very strict.Example: Non-Polly engines use the format
en_us
. In Polly, the same language property syntax isen-US.
In Polly, the language code is lower-case followed by a dash, which is followed by the dialect code in upper-case.
If using multiple languages within the same application it is necessary to write text for each prompt in the desired language.
Example: If you have an initial greeting such as "Hello, thank you for calling" and you have a Spanish language option, then the greeting needs to be written in English and in Spanish, e.g. "Hola, gracias por llamar."
Don’t use unsupported tag properties
VXML defines “type” as an attribute of the
<acronym>
tag. Polly doesn’t support this attribute and, if used, returns an error. In the IVR, this results in no audio.To get the same functionality the tag
<say as type="amount">
gets replaced with<say as interpret-as="unit">
.
Names
Polly doesn't have any special traits for names. Polly reads a person's name as it is input in the code. For example:
Currency
For currency, use the "unit" type and include the currency symbol with the value. The following example simply places the dollar sign before the '100' value.
Alphanumerics
Polly has some challenges reading back alphanumeric values. For the best results, separate the string into its individual numbers and letters. This is easily done in VXML using Javascript, so that you don't have to calculate a separate variable.
In this example, the TTS engine reads back the value of "A, G, 3, 5, E". You may also consider using this in combination with the <prosody> tag to adjust the caller experience. For more information on how Polly handles the prosody tag, see the Polly documentation.
Voice Selection
If the Amazon Polly TTS is specified, and neither a <speak> nor <voice> tag are specified, by default Plum Voice will use the en_US, standard, female voice Joanna.
If another voice is desired, it should specified using the <speak>
and <voice>
tags as follows within the prompt block:
To sequentially use multiple languages and voices within a <prompt>
block, use multiple <speak>
and <voice>
blocks. For example:
Prompt text must be written in the selected language. Polly will not translate English-language prompts to another language.
<speak>
The <speak>
tag should be used to specify the desired language through the attribute xml:lang=”
lg-CN
”
,where lg-CN
is the language-country pair specified in the Language column from that table of supported languages here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.
Please note that each voice has an associated language. Selecting a language that is not associated with the voice will result in unpredictable behavior; however, in many cases, you will hear the language the text was written in accented by that voice’s associated language.
<voice>
The <voice>
tag should be used to specify the desired voice through the attribute name=”name”, where name is the voice specified in the Name/ID column for the table of supported voices here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.
Within the <voice>
tag the attribute variant="number" should be used to specify whether a standard voice (variant=1) or neural voice (variant=2) is desired.
Note that only certain voices support the neural option. See: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html for a list of voices that support neural, and https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html for the differences between standard and neural voices.
Supported SSML Tags
The text to be spoken that is specified within the <voice>
block can further be marked up with the following SSML tags that are supported by Plum Voice for Amazon Polly.
Tag
Description
Availability with Neural Voices
<emphasis>
Emphasizing Words
No
<phoneme>
Using Phonetic Pronunciation
Yes
<prosody>
Controlling Volume, Speaking Rate, and Pitch
Partial
<say-as>
Controlling How Special Types of Words Are Spoken
Partial
<sub>
Pronouncing Acronyms and Abbreviations
Yes
For more details on these SSML tags, please visit: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
Not all tags listed on Amazon's website are currently supported by Plum Voice.
Last updated