Vocalizer 7: <voice> tag and SSML Support
For Vocalizer 7
Last updated
For Vocalizer 7
Last updated
BETA FEATURE:
The Vocalizer 7 TTS engine is currently available as a beta feature. Further testing and updates may take place while Vocalizer 7 is in beta.
We encourage you to share feedback at beta@plumgroup.com.
IMPORTANT! Treat the terms "tags" and "elements" as the same.
The term "tags" used here means the same thing as the term "elements" used in other VXML-related documentation (e.g., the W3C SSML 1.0 Recommendation).
This page covers the <voice> tag in Vocalizer 7 and other SSML tag and attribute features.
<voice>
NOTE: All <voice>
attributes are optional. However, an error will occur if no attribute is specified when using the <voice>
tag.
The <voice>
tag should be used to specify the desired voice through the name
attribute.
Example: <voice name="Allison">
See the list of available voices in the name
section below.
<voice>
attributesAttribute | Supported? |
---|---|
age
This attribute is supported.
Notes: The age
attribute is only useful with the installation of a set of custom voices with varying age over the same language and gender.
name
This attribute is supported.
Notes: See the following language tables for available names:
If selecting a multilingual voice, you can specify one of the voice's other languages as follows:
Specify a voice with <voice>
and name
, then
Specify one of the voice's other languages with the <speak>
and the xml:lang
attribute.
Example:
About xml:lang
Vocalizer 7 does not accept the xml:lang
attribute for the <voice>
tag. If used, the TTS engine will use a default voice instead of the one specified.
We recommend using the <speak>
or <vxml>
tags with the xml:lang
attribute instead. See Recommended best practices for more info.
variant
This attribute is not supported.
xml:lang
This attribute is not supported.
Notes: Avoid using the xml:lang
attribute in the <voice>
tag when setting the TTS language. Vocalizer 7 does not accept it. It will cause the TTS engine to use the default voice instead of the one specified.
Instead, use the xml:lang
attribute of the <speak>
or <vxml>
tags. See Recommended best practices for more info.
Except where noted below, Vocalizer 7 supports all SSML tags and attributes as described in the W3C SSML 1.0 Recommendation.
This section focuses on exceptions and additional details specific to Vocalizer, including the following:
Vocalizer-specific tag and/or attribute behavior.
which tags and/or attributes are unsupported.
additional Vocalizer-specific features like SSML extensions.
Vocalizer 7 features the following added extensions to SSML:
<audio>
: Supports four (4) additional attributes to manage internet fetching:
fetchtimeout
: Time to attempt to open and read the audio document.
maxage
: Value for the HTTP 1.1 cache-control max-age directive.
maxstale
: Value for the HTTP 1.1 ache-control max-stale directive.
fetchhint
: Specify "prefetch"
to allow prefetching the audio content or "safe"
(the default) to follow HTTP 1.1 caching semantics.
<phoneme>
: Supports specifying L&H+ phoneme strings and phoneme strings in the IPA alphabet.
<prompt>
: Supports specifying ActivePrompt IDs.
For more details on these extensions, see the relevant tag's sections below.
The following table summarizes details about SSML tags described in this section.
<audio>
Vocalizer 7 SSML extensions: Supports four Vocalizer-added attributes to manage internet fetching:
fetchtimeout
The time to attempt to open and read the audio document. The value must be an unsigned integer with a mandatory suffix: "s"
for seconds, "ms"
for milliseconds.
Example: "3s"
, "400ms"
maxage
Value for the HTTP 1.1 cache-control max-age directive. This specifies the application is willing to accept a cached copy of the audio document no older than this value.
In most cases, this attribute should not be present, thus allowing the origin server to control cache expiration.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included). A value of 0
may be used to force re-validating the cached copy with the origin server.
maxstale
Value for the HTTP 1.1 cache-control max-stale directive. This specifies the client is willing to accept a cached copy that is expired by up to this value past the expiration time specified by the origin server.
The value must be an unsigned integer to specify the number of seconds. The value must have
no suffix (e.g., no "s"
or "ms"
included).
fetchhint
"prefetch"
to allow prefetching the audio content, "safe"
(the default) to follow HTTP 1.1 caching semantics.
Vocalizer allows this attribute, but currently does not behave differently for "prefetch"
mode.
<break>
strength
This attribute is supported.
Notes:
The following table describes the pause duration for each strength
value:
*The <break strength="none">
setting only has an audible effect when the TTS engine would have inserted a sentence break without an explicit tag.
When using both the time
and the strength
attributes, the time
attribute gets precedence.
<emphasis>
level
This attribute is supported.
Notes: The level="none"
setting is not supported. Vocalizer may ignore it to produce optimal natural speech output.
<lexicon>
Notes:
Vocalizer supports loading user dictionaries, user rulesets, and ActivePrompt databases through this element.
Vocalizer parses all elements and loads tuning data before starting text to speech conversion. This tuning data is unloaded when the last sample buffer is generated, or when the TTS process is stopped, so elements only affect the current synthesis request.
Refer to the W3C SSML 1.0 specification for more information.
type
This attribute is supported and optional.
Notes: If used, the type
attribute overrides the MIME content type returned by the web server (for http access) or extension mapping rules (for local file access)
Valid type
values are as follows:
application/edct-bin-dictionary
for a Vocalizer binary format user dictionary.
application/x-vocalizer-rettt+text
for a text user ruleset.
application/x-vocalizer-rettt+bin
for a binary user ruleset.
application/x-vocalizer-activeprompt-db
for an ActivePrompt database, optionally with ":mode=automatic"
appended to override its default matching mode to automatic.
<meta>
http-equiv
This attribute is not supported.
<p>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<phoneme>
Vocalizer 7 SSML extensions: This tag supports specifying L&H+ phoneme strings when the alphabet attribute is set to "x-l&h+"
, and phoneme strings in the IPA alphabet when the alphabet attribute is set to "ipa"
.
Note that the ampersand is a reserved XML character, so in an SSML document, the L&H+ alphabet needs to be specified with alphabet="x-l&h+"
. Phoneme strings in the IPA alphabet should also use the necessary escape characters, as they cannot be expressed otherwise.
<prompt>
Notes: When using multiple <prompt>
tags with the bargein
attribute, prompt queueing and playback will function differently depending on whether bargein
is allowed or disallowed.
See Audio Formats and Prompts for details.
Vocalizer 7 SSML extensions: This tag supports specifying ActivePrompt IDs, equivalent to the <ESC>\domain\native control sequence. The "id"
attribute is required, and specifies the ActivePrompt in <domain>
:<prompt>
format.
The content of the element specifies fallback text that is only spoken if the ActivePrompt cannot be found (similar to SSML <audio>
).
<prosody>
Notes: The duration
, pitch
, pitch-range
, and contour
attributes are ignored.
volume
This attribute is supported.
Notes: The default value for this attribute is 100. The scale is amplitude linear. Although SSML specifies a range of 0-100, Vocalizer extends this range to 200 with the values above 100 reached via relative changes or the values "loud"
and "x-loud"
.
The following table maps SSML symbolic values to SSML volume values and actual output.
This second, broader table maps SSML volume
values to Vocalizer volume
values:
<s>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
<speak>
Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype
attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? |
---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? |
---|---|---|
Voice name | Gender | Multilingual? |
---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
Voice name | Gender | Multilingual? | Additional languages |
---|---|---|---|
<Tag> name | Vocalizer 7 support | Notes |
---|---|---|
Value | Pause duration (milliseconds) |
---|---|
SSML volume value | Symbolic value | Amplitude amplification factor | Loudness in dB |
---|---|---|---|
SSML volume value | Amplitude amplification factor | Loudness in dB | Vocalizer volume value |
---|---|---|---|