LogoLogo
  • Go to Docs Center
  • Plum DEV Documentation
  • Overview
  • Developer Reference
    • Tutorial
    • How to...
      • Use Plum's Transcription API
    • Audio Formats and Prompts
    • Grammars and Speech Recognition
    • Available ASR Engines
    • TTS Engine Characteristics
      • Amazon Polly TTS Engine
        • Polly Voice Tag Attribute Details
      • AT&T Natural Voices
      • Cepstral Engine
      • RealSpeak Engine
      • Vocalizer 7
        • Vocalizer 7: <voice> tag and SSML Support
    • Data Exchange
    • Logging
    • Caching
    • Root Documents
  • VoiceXML
    • Tags
      • <assign>
      • <audio>
      • <block>
      • <break>
      • <catch>
      • <choice>
      • <clear>
      • <data>
      • <desc>
      • <disconnect>
      • <else>
      • <elseif>
      • <emphasis>
      • <enumerate>
      • <error>
      • <example>
      • <exit>
      • <field>
      • <filled>
      • <foreach>
      • <form>
      • <goto>
      • <grammar>
      • <help>
      • <if>
      • <initial>
      • <item>
      • <lexicon>
      • <link>
      • <log>
      • <mark>
      • <menu>
      • <meta>
      • <metadata>
      • <noinput>
      • <nomatch>
      • <one-of>
      • <option>
      • <paragraph>
      • <param>
      • <phoneme>
      • <prompt>
      • <property>
      • <prosody>
      • <record>
      • <reprompt>
      • <return>
      • <rule>
      • <ruleref>
      • <say-as>
      • <script>
      • <sentence>
      • <speak>
      • <sub>
      • <subdialog>
      • <submit>
      • <tag>
      • <throw>
      • <token>
      • <transfer>
      • <value>
      • <var>
      • <voice>
      • <vxml>
    • Properties
      • audiofetchhint
      • audiomaxage
      • audiomaxstale
      • bargein
      • bargeintype
      • certverifypeer
      • completetimeout
      • confidencelevel
      • datafetchhint
      • datamaxage
      • datamaxstale
      • documentfetchhint
      • documentmaxage
      • documentmaxstale
      • fetchaudio
      • fetchaudiodelay
      • fetchaudiominimum
      • fetchtimeout
      • grammarfetchhint
      • grammarmaxage
      • grammarmaxstale
      • incompletetimeout
      • inputmodes
      • interdigittimeout
      • logging
      • maxnbest
      • maxspeechtimeout
      • normalizeaudio
      • recordcall
      • recordcallappend
      • recordutterance
      • recordutterancetype
      • scriptfetchhint
      • scriptmaxage
      • scriptmaxstale
      • sensitivity
      • speedvsaccuracy
      • termchar
      • termmaxdigits
      • termtimeout
      • timeout
      • universals
      • voicegender
      • voicename
    • Application and Session Variables
      • application.lastresult$[i].confidence
      • application.lastresult$[i].inputmode
      • application.lastresult$[i].interpretation
      • application.lastresult$[i].recording
      • application.lastresult$[i].recordingduration
      • application.lastresult$[i].recordingsize
      • application.lastresult$[i].utterance
      • session.callrecording
      • session.id
      • session.telephone.ani
      • session.telephone.dnis
    • VoiceXML Resources
  • Plum DEV Guide
    • Using the Plum DEV site
    • Using the File Repository
    • Outbound Calling Guide
      • Using the Outbound Tools in the DEV web UI
      • DEV Outbound Programming Notes
      • Outbound FAQs and Tips
    • Call Reporting
    • Analytics
    • VoiceTrends
    • Debugging
    • Scratchpads
    • Saved URLs
    • Voice Biometrics
    • Call Routing
    • Data Security
      • 'Private' Tags
      • Managing Secure Phone Numbers
      • Sensitive Data Types
    • SMS Guide
      • Standard Short Codes
      • SMS Debugging/Error Logs
      • Additional SMS Info
    • Single Sign On
  • Plum DEV APIs
    • DEV Outbound APIs
      • Contacts CSV Formatting
      • Outbound API Parameter Notes
      • Legacy and Miscellaneous Notes
    • SMS API
    • Call Logs API
    • Call Scheduling and Pacing API
    • Transcription API
    • Application API
    • Blocklist API
Powered by GitBook
On this page
  • TTS Engine Selection
  • Polly & Plum DEV Best Practices
  • Using <speak> and <voice> tags
  • Don’t use unsupported tag properties
  • Names
  • Currency
  • Alphanumerics
  • Voice Selection
  • <speak>
  • <voice>
  • Supported SSML Tags
  1. Developer Reference
  2. TTS Engine Characteristics

Amazon Polly TTS Engine

PreviousTTS Engine CharacteristicsNextPolly Voice Tag Attribute Details

Last updated 4 years ago

TTS Engine Selection

Select the Amazon Polly TTS from the list of engines available in the TTS drop down found in our online Plum Dev environment.

Granular Polly voice tag attribute details are available .

Polly & Plum DEV Best Practices

Plum Voice is pleased to offer Polly as a TTS option for Plum DEV. However, users should be aware of the nuances of Polly before switching to it from one of the other TTS engines DEV supports. We compiled some best practices and potential pitfalls to keep in mind when changing the TTS engine for your IVR applications.

Polly is much stricter about certain VXML elements than other DEV TTS engines. Therefore, updating an existing application with Polly doesn’t typically involve large, wholesale changes to a call-flow. Instead, this transition likely requires numerous small updates throughout the application.

Plum engineers continue to build out Polly functionality and this documentation will be updated once new features are developed, tested, and deployed.

Using <speak> and <voice> tags

  • Polly supports using individual <prompt> tags, but limits available functionality

  • Polly requires <speak> and <voice> tags in order to use different voices and languages.

  • The acceptable syntax values for the xml:lang property is slightly different in Polly, but it is also very strict.

    • Example: Non-Polly engines use the formaten_us. In Polly, the same language property syntax is en-US. In Polly, the language code is lower-case followed by a dash, which is followed by the dialect code in upper-case.

  • If using multiple languages within the same application it is necessary to write text for each prompt in the desired language.

    • Example: If you have an initial greeting such as "Hello, thank you for calling" and you have a Spanish language option, then the greeting needs to be written in English and in Spanish, e.g. "Hola, gracias por llamar."

Don’t use unsupported tag properties

  • VXML defines “type” as an attribute of the <acronym> tag. Polly doesn’t support this attribute and, if used, returns an error. In the IVR, this results in no audio.

    • To get the same functionality the tag <say as type="amount"> gets replaced with <say as interpret-as="unit">.

Names

Polly doesn't have any special traits for names. Polly reads a person's name as it is input in the code. For example:

<prompt>
    <speak>
      <voice name="Joanna">
        Jones, Davy
      </voice>
    </speak>      
</prompt>

Currency

For currency, use the "unit" type and include the currency symbol with the value. The following example simply places the dollar sign before the '100' value.

<say-as interpret-as="unit">$100</say-as>

Alphanumerics

Polly has some challenges reading back alphanumeric values. For the best results, separate the string into its individual numbers and letters. This is easily done in VXML using Javascript, so that you don't have to calculate a separate variable.

<var name="id" expr='"AG35E"'/>
<block>
  <prompt>
    <speak>
      <voice name="Lupe">
        <value expr="id.split('').join(', ')"/>
      </voice>
    </speak>      
  </prompt>
</block>

Voice Selection

If the Amazon Polly TTS is specified, and neither a <speak> nor <voice> tag are specified, by default Plum Voice will use the en_US, standard, female voice Joanna.

If another voice is desired, it should specified using the <speak> and <voice> tags as follows within the prompt block:

<?xml version="1.0"?>
<vxml version="2.0">
 <form>
  <block>
   <prompt>
    <speak xml:lang=”es-MX”>
    <voice name="Mia" variant=”1”>
     Hello, thank you for calling Plum Voice.
    </voice>
    </speak>
   </prompt>
  </block>
 </form>
</vxml>

To sequentially use multiple languages and voices within a <prompt> block, use multiple <speak> and <voice> blocks. For example:

<?xml version="1.0"?>
<vxml version="2.0">
 <form>
  <block>
   <prompt>
    <speak xml:lang="en-US">
    <voice name="Joanna" variant="2">
     Press one to continue in English.
    </voice>
    </speak>
    <speak xml:lang="es-US">
    <voice name="Lupe" variant="2">
     Presione dos para continuar en español.
    </voice>
    </speak> 
    <speak xml:lang="fr-FR">
    <voice name="Celine" variant="standard">
     Appuyez sur trois pour continuer en français.
    </voice>
    </speak>
   </prompt>
  </block>
 </form>
</vxml>

Prompt text must be written in the selected language. Polly will not translate English-language prompts to another language.

<speak>

Please note that each voice has an associated language. Selecting a language that is not associated with the voice will result in unpredictable behavior; however, in many cases, you will hear the language the text was written in accented by that voice’s associated language.

<voice>

Supported SSML Tags

The text to be spoken that is specified within the <voice> block can further be marked up with the following SSML tags that are supported by Plum Voice for Amazon Polly.

Tag

Description

Availability with Neural Voices

<emphasis>

Emphasizing Words

No

<phoneme>

Using Phonetic Pronunciation

Yes

<prosody>

Controlling Volume, Speaking Rate, and Pitch

Partial

<say-as>

Controlling How Special Types of Words Are Spoken

Partial

<sub>

Pronouncing Acronyms and Abbreviations

Yes

Not all tags listed on Amazon's website are currently supported by Plum Voice.

In this example, the TTS engine reads back the value of "A, G, 3, 5, E". You may also consider using this in combination with the tag to adjust the caller experience. For more information on how Polly handles the prosody tag, .

The <speak> tag should be used to specify the desired language through the attribute xml:lang=”lg-CN”,where lg-CN is the language-country pair specified in the Language column from that table of supported languages here: .

The <voice> tag should be used to specify the desired voice through the attribute name=”name”, where name is the voice specified in the Name/ID column for the table of supported voices here: .

Within the <voice> tag the attribute variant="number" should be used to specify whether a standard voice (variant=1) or neural voice (variant=2) is desired. Note that only certain voices support the neural option. See: for a list of voices that support neural, and for the differences between standard and neural voices.

For more details on these SSML tags, please visit:

here
<prosody>
see the Polly documentation
https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html
https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html