Vocalizer 7: <voice> tag and SSML Support
For Vocalizer 7
Last updated
For Vocalizer 7
Last updated
BETA FEATURE:
The Vocalizer 7 TTS engine is currently available as a beta feature. Further testing and updates may take place while Vocalizer 7 is in beta.
We encourage you to share feedback at beta@plumgroup.com.
This page covers the <voice> tag in Vocalizer 7 and other SSML tag and attribute features.
<voice>
The <voice>
tag should be used to specify the desired voice through the name
attribute.
Example: <voice name="Allison">
See the list of available voices in the section below.
<voice>
attributesage
name
variant
xml:lang
age
This attribute is supported.
Notes: The age
attribute is only useful with the installation of a set of custom voices with varying age over the same language and gender.
name
This attribute is supported.
Notes: See the following language tables for available names:
Allison
Ava
Ava-ml
Evan
Samantha
Susan
Tom
Zoe
Zoe-ml
Daniel
Kate
Malcolm
Oliver
Ruby
Serena
Angelica
Javier
Paulina
Paulina-ml
Anna
Anna-ml
Markus
Petra
Viktor
Amelie
Amelie-ml
Nicolas
Audrey
Audrey-ml
Thomas
Felipe
Fernanda
Luciana
Catarina
Juana
Joaquim
Aasing-ml
Sinji-ml
Binbin-ml
Lili-ml
If selecting a multilingual voice, you can specify one of the voice's other languages as follows:
Specify a voice with <voice>
and name
, then
Specify one of the voice's other languages with the <speak>
and the xml:lang
attribute.
Example:
About xml:lang
Vocalizer 7 does not accept the xml:lang
attribute for the <voice>
tag. If used, the TTS engine will use a default voice instead of the one specified.
variant
This attribute is not supported.
xml:lang
This attribute is not supported.
Notes: Avoid using the xml:lang
attribute in the <voice>
tag when setting the TTS language. Vocalizer 7 does not accept it. It will cause the TTS engine to use the default voice instead of the one specified.
This section focuses on exceptions and additional details specific to Vocalizer, including the following:
Vocalizer-specific tag and/or attribute behavior.
which tags and/or attributes are unsupported.
additional Vocalizer-specific features like SSML extensions.
Vocalizer 7 features the following added extensions to SSML:
fetchtimeout
: Time to attempt to open and read the audio document.
maxage
: Value for the HTTP 1.1 cache-control max-age directive.
maxstale
: Value for the HTTP 1.1 ache-control max-stale directive.
fetchhint
: Specify "prefetch"
to allow prefetching the audio content or "safe"
(the default) to follow HTTP 1.1 caching semantics.
For more details on these extensions, see the relevant tag's sections below.
The following table summarizes details about SSML tags described in this section.
<audio>
Has four Vocalizer-specific attributes for use. See the <audio>
section for details.
<break>
See the <break>
section for details on handling its strength
attribute.
<emphasis>
The level
attribute is supported, but the level="none"
setting is not.
<lexicon>
See the <lexicon>
section for details on handling this tag and its attributes.
<meta>
The http-equiv
attribute is not supported.
<p>
Supports an optional ssft-domaintype
attribute. See the <p>
section for details.
<phoneme>
Supports specifying phoneme strings. See the <phoneme>
section for details.
<prompt>
Supports specifying ActivePrompt IDs. See the <prompt>
section for details.
<prosody>
Vocalizer ignores the use of duration
, pitch
, pitch-range
, and contour
attributes. See the <prosody>
section for details on handling the volume attribute.
<s>
Supports an optional ssft-domaintype
attribute.
See the<s>
section for details.
<speak>
Supports an optional ssft-domaintype
attribute. See the <speak>
section for details.
<audio>
Vocalizer 7 SSML extensions: Supports four Vocalizer-added attributes to manage internet fetching:
fetchtimeout
The time to attempt to open and read the audio document. The value must be an unsigned integer with a mandatory suffix: "s"
for seconds, "ms"
for milliseconds.
Example: "3s"
, "400ms"
maxage
Value for the HTTP 1.1 cache-control max-age directive. This specifies the application is willing to accept a cached copy of the audio document no older than this value.
In most cases, this attribute should not be present, thus allowing the origin server to control cache expiration.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included). A value of 0
may be used to force re-validating the cached copy with the origin server.
maxstale
Value for the HTTP 1.1 cache-control max-stale directive. This specifies the client is willing to accept a cached copy that is expired by up to this value past the expiration time specified by the origin server.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included).
fetchhint
"prefetch"
to allow prefetching the audio content, "safe"
(the default) to follow HTTP 1.1 caching semantics.
Vocalizer allows this attribute, but currently does not behave differently for "prefetch"
mode.
<break>
strength
This attribute is supported.
Notes:
The following table describes the pause duration for each strength
value:
x-weak
100 ms
weak
200 ms
medium
400 ms
strong
700 ms
x-strong
1200 ms
none
0 ms*
*The <break strength="none">
setting only has an audible effect when the TTS engine would have inserted a sentence break without an explicit tag.
When using both the time
and the strength
attributes, the time
attribute gets precedence.
<emphasis>
level
This attribute is supported.
Notes: The level="none"
setting is not supported. Vocalizer may ignore it to produce optimal natural speech output.
<lexicon>
Notes:
Vocalizer supports loading user dictionaries, user rulesets, and ActivePrompt databases through this element.
Vocalizer parses all elements and loads tuning data before starting text to speech conversion. This tuning data is unloaded when the last sample buffer is generated, or when the TTS process is stopped, so elements only affect the current synthesis request.
type
This attribute is supported and optional.
Notes: If used, the type
attribute overrides the MIME content type returned by the web server (for http access) or extension mapping rules (for local file access)
Valid type
values are as follows:
application/edct-bin-dictionary
for a Vocalizer binary format user dictionary.
application/x-vocalizer-rettt+text
for a text user ruleset.
application/x-vocalizer-rettt+bin
for a binary user ruleset.
application/x-vocalizer-activeprompt-db
for an ActivePrompt database, optionally with ":mode=automatic"
appended to override its default matching mode to automatic.
<meta>
http-equiv
This attribute is not supported.
<p>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<phoneme>
Vocalizer 7 SSML extensions: This tag supports specifying L&H+ phoneme strings when the alphabet attribute is set to "x-l&h+"
, and phoneme strings in the IPA alphabet when the alphabet attribute is set to "ipa"
.
Note that the ampersand is a reserved XML character, so in an SSML document, the L&H+ alphabet needs to be specified with alphabet="x-l&h+"
. Phoneme strings in the IPA alphabet should also use the necessary escape characters, as they cannot be expressed otherwise.
<prompt>
Notes: When using multiple <prompt>
tags with the bargein
attribute, prompt queueing and playback will function differently depending on whether bargein
is allowed or disallowed.
Vocalizer 7 SSML extensions: This tag supports specifying ActivePrompt IDs, equivalent to the <ESC>\domain\native control sequence. The "id"
attribute is required, and specifies the ActivePrompt in <domain>
:<prompt>
format.
The content of the element specifies fallback text that is only spoken if the ActivePrompt cannot be found (similar to SSML <audio>
).
<prosody>
Notes: The duration
, pitch
, pitch-range
, and contour
attributes are ignored.
volume
This attribute is supported.
Notes: The default value for this attribute is 100. The scale is amplitude linear. Although SSML specifies a range of 0-100, Vocalizer extends this range to 200 with the values above 100 reached via relative changes or the values "loud"
and "x-loud"
.
The following table maps SSML symbolic values to SSML volume values and actual output.
0
silent
0.00
¥ dB
18
x-soft
0.18
-15.0 dB
50
soft
0.50
-6.0 dB
100
medium
1.00
0.0 dB
(141)
loud
1.41
+3.0 dB
(200)
x-loud
2.00
+6.0 dB
This second, broader table maps SSML volume
values to Vocalizer volume
values:
0
0.00
¥ dB
0
10
0.10
-20.0 dB
13
20
0.20
-14.0 dB
33
30
0.30
-10.5 dB
45
40
0.40
-8.0 dB
53
50
0.50
-6.0 dB
60
60
0.60
-4.4 dB
65
70
0.70
-3.1 dB
70
80
0.80
-1.9 dB
74
90
0.90
-0.9 dB
77
100
1.00
0.0 dB
80
(141)
1.41
+3.0 dB
90
(200)
2.00
+6.0 dB
100
<s>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<speak>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
We recommend using the <speak>
or <vxml>
tags with the xml:lang
attribute instead. See for more info.
Instead, use the xml:lang
attribute of the <speak>
or <vxml>
tags. See for more info.
Except where noted below, Vocalizer 7 supports all SSML tags and attributes as described in the .
: Supports four (4) additional attributes to manage internet fetching:
: Supports specifying L&H+ phoneme strings and phoneme strings in the IPA alphabet.
, , and : Each tag supports an optional
ssft-domaintype
attribute for activating an ActivePrompt domain.
: Supports specifying ActivePrompt IDs.
Refer to the for more information.
See for details.