Select the Amazon Polly TTS from the list of engines available in the TTS drop down found in our online Plum Dev environment.
Plum Voice is pleased to offer Polly as a TTS option for Plum DEV. However, users should be aware of the nuances of Polly before switching to it from one of the other TTS engines DEV supports. We compiled some best practices and potential pitfalls to keep in mind when changing the TTS engine for your IVR applications.
Polly is much stricter about certain VXML elements than other DEV TTS engines. Therefore, updating an existing application with Polly doesn’t typically involve large, wholesale changes to a call-flow. Instead, this transition likely requires numerous small updates throughout the application.
Plum engineers continue to build out Polly functionality and this documentation will be updated once new features are developed, tested, and deployed.
Polly supports using individual
<prompt> tags, but limits available functionality
<voice> tags in order to use different voices and languages.
The acceptable syntax values for the
xml:lang property is slightly different in Polly, but it is also very strict.
Example: Non-Polly engines use the format
en_us. In Polly, the same language property syntax is
en-US. In Polly, the language code is lower-case followed by a dash, which is followed by the dialect code in upper-case.
If using multiple languages within the same application it is necessary to write text for each prompt in the desired language.
Example: If you have an initial greeting such as "Hello, thank you for calling" and you have a Spanish language option, then the greeting needs to be written in English and in Spanish, e.g. "Hola, gracias por llamar."
VXML defines “type” as an attribute of the
<acronym> tag. Polly doesn’t support this attribute and, if used, returns an error. In the IVR, this results in no audio.
To get the same functionality the tag
<say as type="amount"> gets replaced with
<say as interpret-as="unit">.
If the Amazon Polly TTS is specified, and neither a <speak> nor <voice> tag are specified, by default Plum Voice will use the en_US, standard, female voice Joanna.
If another voice is desired, it should specified using the
<voice> tags as follows within the prompt block:
<?xml version="1.0"?><vxml version="2.0"><form><block><prompt><speak xml:lang=”es-MX”><voice name="Mia" variant=”1”>Hello, thank you for calling Plum Voice.</voice></speak></prompt></block></form></vxml>
To sequentially use multiple languages and voices within a
<prompt> block, use multiple
<voice> blocks. For example:
<?xml version="1.0"?><vxml version="2.0"><form><block><prompt><speak xml:lang="en-US"><voice name="Joanna" variant="2">Press one to continue in English.</voice></speak><speak xml:lang="es-US"><voice name="Lupe" variant="2">Presione dos para continuar en español.</voice></speak><speak xml:lang="fr-FR"><voice name="Celine" variant="standard">Appuyez sur trois pour continuer en français.</voice></speak></prompt></block></form></vxml>
<speak> tag should be used to specify the desired language through the attribute
lg-CN is the language-country pair specified in the Language column from that table of supported languages here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.
Please note that each voice has an associated language. Selecting a language that is not associated with the voice will result in unpredictable behavior; however, in many cases, you will hear the language the text was written in accented by that voice’s associated language.
<voice> tag should be used to specify the desired voice through the attribute name=”name”, where name is the voice specified in the Name/ID column for the table of supported voices here: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html.
<voice> tag the attribute variant="number" should be used to specify whether a standard voice (variant=1) or neural voice (variant=2) is desired.
Note that only certain voices support the neural option. See: https://docs.aws.amazon.com/polly/latest/dg/voicelist.html for a list of voices that support neural, and https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html for the differences between standard and neural voices.
The text to be spoken that is specified within the
<voice> block can further be marked up with the following SSML tags that are supported by Plum Voice for Amazon Polly.
Availability with Neural Voices
Using Phonetic Pronunciation
Controlling Volume, Speaking Rate, and Pitch
Controlling How Special Types of Words Are Spoken
Pronouncing Acronyms and Abbreviations
For more details on these SSML tags, please visit: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html