Vocalizer 7: <voice> tag and SSML Support
For Vocalizer 7
BETA FEATURE:
The Vocalizer 7 TTS engine is currently available as a beta feature. Further testing and updates may take place while Vocalizer 7 is in beta.
We encourage you to share feedback at beta@plumgroup.com.
IMPORTANT! Treat the terms "tags" and "elements" as the same.
The term "tags" used here means the same thing as the term "elements" used in other VXML-related documentation (e.g., the W3C SSML 1.0 Recommendation).
Summary
This page covers the <voice> tag in Vocalizer 7 and other SSML tag and attribute features.
<voice>
<voice>
NOTE: All <voice>
attributes are optional. However, an error will occur if no attribute is specified when using the <voice>
tag.
The <voice>
tag should be used to specify the desired voice through the name
attribute.
Example: <voice name="Allison">
See the list of available voices in the name
section below.
Key <voice>
attributes
<voice>
attributesage
name
variant
xml:lang
age
age
This attribute is supported.
Notes: The age
attribute is only useful with the installation of a set of custom voices with varying age over the same language and gender.
name
name
This attribute is supported.
Notes: See the following language tables for available names:
American English (en-US) (9 total)
Allison
Ava
Ava-ml
Evan
Samantha
Susan
Tom
Zoe
Zoe-ml
British English (en-GB) (6 total)
Daniel
Kate
Malcolm
Oliver
Ruby
Serena
Spanish (es-ES) (4 total)
Angelica
Javier
Paulina
Paulina-ml
German (de-DE) (5 total)
Anna
Anna-ml
Markus
Petra
Viktor
Canadian French (fr-CA) (3 total)
Amelie
Amelie-ml
Nicolas
French (fr-FR) (3 total)
Audrey
Audrey-ml
Thomas
Brazilian Portuguese (pt-BR) (3 total)
Felipe
Fernanda
Luciana
Portuguese (pt-PT) (3 total)
Catarina
Juana
Joaquim
Cantonese (zh-HK) (2 total)
Aasing-ml
Sinji-ml
Mandarin (zh-CN) (2 total)
Binbin-ml
Lili-ml
Multilingual voices
If selecting a multilingual voice, you can specify one of the voice's other languages as follows:
Specify a voice with
<voice>
andname
, thenSpecify one of the voice's other languages with the
<speak>
and thexml:lang
attribute.
Example:
About xml:lang
Vocalizer 7 does not accept the xml:lang
attribute for the <voice>
tag. If used, the TTS engine will use a default voice instead of the one specified.
We recommend using the <speak>
or <vxml>
tags with the xml:lang
attribute instead. See Recommended best practices for more info.
variant
variant
This attribute is not supported.
xml:lang
xml:lang
This attribute is not supported.
Notes: Avoid using the xml:lang
attribute in the <voice>
tag when setting the TTS language. Vocalizer 7 does not accept it. It will cause the TTS engine to use the default voice instead of the one specified.
Instead, use the xml:lang
attribute of the <speak>
or <vxml>
tags. See Recommended best practices for more info.
Additional SSML tag details
Except where noted below, Vocalizer 7 supports all SSML tags and attributes as described in the W3C SSML 1.0 Recommendation.
This section focuses on exceptions and additional details specific to Vocalizer, including the following:
Vocalizer-specific tag and/or attribute behavior.
which tags and/or attributes are unsupported.
additional Vocalizer-specific features like SSML extensions.
Built-in, unique SSML extensions
Vocalizer 7 features the following added extensions to SSML:
<audio>
: Supports four (4) additional attributes to manage internet fetching:fetchtimeout
: Time to attempt to open and read the audio document.maxage
: Value for the HTTP 1.1 cache-control max-age directive.maxstale
: Value for the HTTP 1.1 ache-control max-stale directive.fetchhint
: Specify"prefetch"
to allow prefetching the audio content or"safe"
(the default) to follow HTTP 1.1 caching semantics.
<phoneme>
: Supports specifying L&H+ phoneme strings and phoneme strings in the IPA alphabet.<prompt>
: Supports specifying ActivePrompt IDs.
For more details on these extensions, see the relevant tag's sections below.
Vocalizer SSML tag reference table
The following table summarizes details about SSML tags described in this section.
<audio>
Has four Vocalizer-specific attributes for use. See the <audio>
section for details.
<break>
See the <break>
section for details on handling its strength
attribute.
<emphasis>
The level
attribute is supported, but the level="none"
setting is not.
<lexicon>
See the <lexicon>
section for details on handling this tag and its attributes.
<meta>
The http-equiv
attribute is not supported.
<p>
Supports an optional ssft-domaintype
attribute. See the <p>
section for details.
<phoneme>
Supports specifying phoneme strings. See the <phoneme>
section for details.
<prompt>
Supports specifying ActivePrompt IDs. See the <prompt>
section for details.
<prosody>
Vocalizer ignores the use of duration
, pitch
, pitch-range
, and contour
attributes. See the <prosody>
section for details on handling the volume attribute.
<s>
Supports an optional ssft-domaintype
attribute.
See the<s>
section for details.
<speak>
Supports an optional ssft-domaintype
attribute. See the <speak>
section for details.
<audio>
<audio>
Vocalizer 7 SSML extensions: Supports four Vocalizer-added attributes to manage internet fetching:
fetchtimeout
fetchtimeout
The time to attempt to open and read the audio document. The value must be an unsigned integer with a mandatory suffix: "s"
for seconds, "ms"
for milliseconds.
Example: "3s"
, "400ms"
maxage
maxage
Value for the HTTP 1.1 cache-control max-age directive. This specifies the application is willing to accept a cached copy of the audio document no older than this value.
In most cases, this attribute should not be present, thus allowing the origin server to control cache expiration.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included). A value of 0
may be used to force re-validating the cached copy with the origin server.
maxstale
maxstale
Value for the HTTP 1.1 cache-control max-stale directive. This specifies the client is willing to accept a cached copy that is expired by up to this value past the expiration time specified by the origin server.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included).
fetchhint
fetchhint
"prefetch"
to allow prefetching the audio content, "safe"
(the default) to follow HTTP 1.1 caching semantics.
Vocalizer allows this attribute, but currently does not behave differently for "prefetch"
mode.
<break>
<break>
strength
strength
This attribute is supported.
Notes:
The following table describes the pause duration for each
strength
value:
x-weak
100 ms
weak
200 ms
medium
400 ms
strong
700 ms
x-strong
1200 ms
none
0 ms*
*The
<break strength="none">
setting only has an audible effect when the TTS engine would have inserted a sentence break without an explicit tag.When using both the
time
and thestrength
attributes, thetime
attribute gets precedence.
<emphasis>
<emphasis>
level
level
This attribute is supported.
Notes: The level="none"
setting is not supported. Vocalizer may ignore it to produce optimal natural speech output.
<lexicon>
<lexicon>
Notes:
Vocalizer supports loading user dictionaries, user rulesets, and ActivePrompt databases through this element.
Vocalizer parses all elements and loads tuning data before starting text to speech conversion. This tuning data is unloaded when the last sample buffer is generated, or when the TTS process is stopped, so elements only affect the current synthesis request.
Refer to the W3C SSML 1.0 specification for more information.
type
This attribute is supported and optional.
Notes: If used, the type
attribute overrides the MIME content type returned by the web server (for http access) or extension mapping rules (for local file access)
Valid type
values are as follows:
application/edct-bin-dictionary
for a Vocalizer binary format user dictionary.application/x-vocalizer-rettt+text
for a text user ruleset.application/x-vocalizer-rettt+bin
for a binary user ruleset.application/x-vocalizer-activeprompt-db
for an ActivePrompt database, optionally with":mode=automatic"
appended to override its default matching mode to automatic.
<meta>
<meta>
http-equiv
http-equiv
This attribute is not supported.
<p>
<p>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<phoneme>
<phoneme>
Vocalizer 7 SSML extensions: This tag supports specifying L&H+ phoneme strings when the alphabet attribute is set to "x-l&h+"
, and phoneme strings in the IPA alphabet when the alphabet attribute is set to "ipa"
.
Note that the ampersand is a reserved XML character, so in an SSML document, the L&H+ alphabet needs to be specified with alphabet="x-l&h+"
. Phoneme strings in the IPA alphabet should also use the necessary escape characters, as they cannot be expressed otherwise.
<prompt>
<prompt>
Notes: When using multiple <prompt>
tags with the bargein
attribute, prompt queueing and playback will function differently depending on whether bargein
is allowed or disallowed.
See Audio Formats and Prompts for details.
Vocalizer 7 SSML extensions: This tag supports specifying ActivePrompt IDs, equivalent to the <ESC>\domain\native control sequence. The "id"
attribute is required, and specifies the ActivePrompt in <domain>
:<prompt>
format.
The content of the element specifies fallback text that is only spoken if the ActivePrompt cannot be found (similar to SSML <audio>
).
<prosody>
<prosody>
Notes: The duration
, pitch
, pitch-range
, and contour
attributes are ignored.
volume
This attribute is supported.
Notes: The default value for this attribute is 100. The scale is amplitude linear. Although SSML specifies a range of 0-100, Vocalizer extends this range to 200 with the values above 100 reached via relative changes or the values "loud"
and "x-loud"
.
The following table maps SSML symbolic values to SSML volume values and actual output.
0
silent
0.00
¥ dB
18
x-soft
0.18
-15.0 dB
50
soft
0.50
-6.0 dB
100
medium
1.00
0.0 dB
(141)
loud
1.41
+3.0 dB
(200)
x-loud
2.00
+6.0 dB
This second, broader table maps SSML volume
values to Vocalizer volume
values:
0
0.00
¥ dB
0
10
0.10
-20.0 dB
13
20
0.20
-14.0 dB
33
30
0.30
-10.5 dB
45
40
0.40
-8.0 dB
53
50
0.50
-6.0 dB
60
60
0.60
-4.4 dB
65
70
0.70
-3.1 dB
70
80
0.80
-1.9 dB
74
90
0.90
-0.9 dB
77
100
1.00
0.0 dB
80
(141)
1.41
+3.0 dB
90
(200)
2.00
+6.0 dB
100
<s>
<s>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<speak>
<speak>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
Last updated