Vocalizer 7: <voice> tag and SSML Support

For Vocalizer 7

BETA FEATURE:

The Vocalizer 7 TTS engine is currently available as a beta feature. Further testing and updates may take place while Vocalizer 7 is in beta.

We encourage you to share feedback at [email protected].

IMPORTANT! Treat the terms "tags" and "elements" as the same.

The term "tags" used here means the same thing as the term "elements" used in other VXML-related documentation (e.g., the W3C SSML 1.0 Recommendation).

Summary

This page covers the <voice> tag in Vocalizer 7 and other SSML tag and attribute features.

`<voice>`

NOTE: All <voice> attributes are optional. However, an error will occur if no attribute is specified when using the <voice> tag.

The <voice> tag should be used to specify the desired voice through the name attribute.

Example: <voice name="Allison">

See the list of available voices in the name section below.

Key `<voice>` attributes

Attribute

Supported?

age

name

variant

xml:lang

`age`

This attribute is supported.

Notes: The age attribute is only useful with the installation of a set of custom voices with varying age over the same language and gender.

`name`

This attribute is supported.

Notes: See the following language tables for available names:

American English (en-US) (9 total)

British English (en-GB) (6 total)

Spanish (es-ES) (4 total)

Voice name

Gender

Multilingual?

Additional languages

Angelica

Female

Javier

Male

Paulina

Female

Paulina-ml

Female

en-USfr-CA

German (de-DE) (5 total)

Canadian French (fr-CA) (3 total)

Voice name

Gender

Multilingual?

Additional languages

Amelie

Female

Amelie-ml

Female

en-US

Nicolas

Male

French (fr-FR) (3 total)

Voice name

Gender

Multilingual?

Additional languages

Audrey

Female

Audrey-ml

Female

en-GBes-ESde-DEit-IT

Thomas

Male

Brazilian Portuguese (pt-BR) (3 total)

Voice name

Gender

Multilingual?

Felipe

Male

Fernanda

Female

Luciana

Female

Portuguese (pt-PT) (3 total)

Voice name

Gender

Multilingual?

Catarina

Female

Juana

Female

Joaquim

Male

Cantonese (zh-HK) (2 total)

Voice name

Gender

Multilingual?

Additional languages

Aasing-ml

Male

en-GB

Sinji-ml

Female

en-GB

Mandarin (zh-CN) (2 total)

Voice name

Gender

Multilingual?

Additional languages

Binbin-ml

Male

en-US

Lili-ml

Female

en-US

Multilingual voices

If selecting a multilingual voice, you can specify one of the voice's other languages as follows:

Specify a voice with <voice> and name, then
Specify one of the voice's other languages with the <speak> and the xml:lang attribute.

Example:

<speak xml:lang="fr-CA">
<voice name="Ava-ml">

About xml:lang

Vocalizer 7 does not accept the xml:lang attribute for the <voice> tag. If used, the TTS engine will use a default voice instead of the one specified.

We recommend using the <speak> or <vxml> tags with the xml:lang attribute instead. See Recommended best practices for more info.

`variant`

This attribute is not supported.

`xml:lang`

This attribute is not supported.

Notes: Avoid using the xml:lang attribute in the <voice> tag when setting the TTS language. Vocalizer 7 does not accept it. It will cause the TTS engine to use the default voice instead of the one specified.

Instead, use the xml:lang attribute of the <speak> or <vxml> tags. See Recommended best practices for more info.

Additional SSML tag details

Except where noted below, Vocalizer 7 supports all SSML tags and attributes as described in the W3C SSML 1.0 Recommendation.

This section focuses on exceptions and additional details specific to Vocalizer, including the following:

Vocalizer-specific tag and/or attribute behavior.
which tags and/or attributes are unsupported.
additional Vocalizer-specific features like SSML extensions.

Built-in, unique SSML extensions

Vocalizer 7 features the following added extensions to SSML:

<audio>: Supports four (4) additional attributes to manage internet fetching:
- fetchtimeout: Time to attempt to open and read the audio document.
- maxage: Value for the HTTP 1.1 cache-control max-age directive.
- maxstale: Value for the HTTP 1.1 ache-control max-stale directive.
- fetchhint: Specify "prefetch" to allow prefetching the audio content or "safe" (the default) to follow HTTP 1.1 caching semantics.
<phoneme>: Supports specifying L&H+ phoneme strings and phoneme strings in the IPA alphabet.
<speak>, <s>, and <p>: Each tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain.
<prompt>: Supports specifying ActivePrompt IDs.

For more details on these extensions, see the relevant tag's sections below.

Vocalizer SSML tag reference table

The following table summarizes details about SSML tags described in this section.

<Tag> name

Vocalizer 7 support

Notes

<audio>

SupportedHas SSML Extensions

Has four Vocalizer-specific attributes for use. See the <audio> section for details.

<break>

Supported

See the <break> section for details on handling its strength attribute.

<emphasis>

Partial support

The level attribute is supported, but the level="none" setting is not.

<lexicon>

Supported

See the <lexicon> section for details on handling this tag and its attributes.

<meta>

Partial support

The http-equiv attribute is not supported.

<p>

SupportedHas SSML Extensions

Supports an optional ssft-domaintype attribute. See the <p> section for details.

<phoneme>

SupportedHas SSML Extensions

Supports specifying phoneme strings. See the <phoneme> section for details.

<prompt>

Supported

Supports specifying ActivePrompt IDs. See the <prompt> section for details.

<prosody>

Partial support

Vocalizer ignores the use of duration, pitch, pitch-range, and contour attributes. See the <prosody> section for details on handling the volume attribute.

<s>

SupportedHas SSML Extensions

Supports an optional ssft-domaintype attribute. See the<s> section for details.

<speak>

SupportedHas SSML Extensions

Supports an optional ssft-domaintype attribute. See the <speak> section for details.

`<audio>`

Vocalizer 7 SSML extensions: Supports four Vocalizer-added attributes to manage internet fetching:

`fetchtimeout`

The time to attempt to open and read the audio document. The value must be an unsigned integer with a mandatory suffix: "s" for seconds, "ms" for milliseconds.

Example: "3s", "400ms"

`maxage`

Value for the HTTP 1.1 cache-control max-age directive. This specifies the application is willing to accept a cached copy of the audio document no older than this value.

In most cases, this attribute should not be present, thus allowing the origin server to control cache expiration.

The value must be an unsigned integer to specify the number of seconds. The value must have no suffix (e.g., no "s" or "ms" included). A value of 0 may be used to force re-validating the cached copy with the origin server.

`maxstale`

Value for the HTTP 1.1 cache-control max-stale directive. This specifies the client is willing to accept a cached copy that is expired by up to this value past the expiration time specified by the origin server.

The value must be an unsigned integer to specify the number of seconds. The value must have no suffix (e.g., no "s" or "ms" included).

`fetchhint`

"prefetch" to allow prefetching the audio content, "safe" (the default) to follow HTTP 1.1 caching semantics.

Vocalizer allows this attribute, but currently does not behave differently for "prefetch" mode.

`<break>`

`strength`

This attribute is supported.

Notes:

The following table describes the pause duration for each strength value:

Value

Pause duration (milliseconds)

x-weak

100 ms

weak

200 ms

medium

400 ms

strong

700 ms

x-strong

1200 ms

none

0 ms*

*The <break strength="none"> setting only has an audible effect when the TTS engine would have inserted a sentence break without an explicit tag.
When using both the time and the strength attributes, the time attribute gets precedence.

`<emphasis>`

`level`

This attribute is supported.

Notes: The level="none" setting is not supported. Vocalizer may ignore it to produce optimal natural speech output.

`<lexicon>`

Notes:

Vocalizer supports loading user dictionaries, user rulesets, and ActivePrompt databases through this element.
Vocalizer parses all elements and loads tuning data before starting text to speech conversion. This tuning data is unloaded when the last sample buffer is generated, or when the TTS process is stopped, so elements only affect the current synthesis request.
Refer to the W3C SSML 1.0 specification for more information.

type

This attribute is supported and optional.

Notes: If used, the type attribute overrides the MIME content type returned by the web server (for http access) or extension mapping rules (for local file access)

Valid type values are as follows:

application/edct-bin-dictionary for a Vocalizer binary format user dictionary.
application/x-vocalizer-rettt+text for a text user ruleset.
application/x-vocalizer-rettt+bin for a binary user ruleset.
application/x-vocalizer-activeprompt-db for an ActivePrompt database, optionally with ":mode=automatic" appended to override its default matching mode to automatic.

`<meta>`

`http-equiv`

This attribute is not supported.

`<p>`

Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.

`<phoneme>`

Vocalizer 7 SSML extensions: This tag supports specifying L&H+ phoneme strings when the alphabet attribute is set to "x-l&h+", and phoneme strings in the IPA alphabet when the alphabet attribute is set to "ipa".

Note that the ampersand is a reserved XML character, so in an SSML document, the L&H+ alphabet needs to be specified with alphabet="x-l&h+". Phoneme strings in the IPA alphabet should also use the necessary escape characters, as they cannot be expressed otherwise.

`<prompt>`

Notes: When using multiple <prompt> tags with the bargein attribute, prompt queueing and playback will function differently depending on whether bargein is allowed or disallowed.

See Audio Formats and Prompts for details.

Vocalizer 7 SSML extensions: This tag supports specifying ActivePrompt IDs, equivalent to the <ESC>\domain\native control sequence. The "id" attribute is required, and specifies the ActivePrompt in <domain>:<prompt> format.

The content of the element specifies fallback text that is only spoken if the ActivePrompt cannot be found (similar to SSML <audio>).

`<prosody>`

Notes: The duration, pitch, pitch-range, and contour attributes are ignored.

volume

This attribute is supported.

Notes: The default value for this attribute is 100. The scale is amplitude linear. Although SSML specifies a range of 0-100, Vocalizer extends this range to 200 with the values above 100 reached via relative changes or the values "loud" and "x-loud".

The following table maps SSML symbolic values to SSML volume values and actual output.

SSML volume value

Symbolic value

Amplitude amplification factor

Loudness in dB

silent

0.00

¥ dB

x-soft

0.18

-15.0 dB

soft

0.50

-6.0 dB

100

medium

1.00

0.0 dB

(141)

loud

1.41

+3.0 dB

(200)

x-loud

2.00

+6.0 dB

This second, broader table maps SSML volume values to Vocalizer volume values:

SSML volume value

Amplitude amplification factor

Loudness in dB

Vocalizer volume value

0.00

¥ dB

0.10

-20.0 dB

0.20

-14.0 dB

0.30

-10.5 dB

0.40

-8.0 dB

0.50

-6.0 dB

0.60

-4.4 dB

0.70

-3.1 dB

0.80

-1.9 dB

0.90

-0.9 dB

100

1.00

0.0 dB

(141)

1.41

+3.0 dB

(200)

2.00

+6.0 dB

100

`<s>`

`<speak>`

PreviousVocalizer 7 NextData Exchange

Last updated 2 years ago

Summary

<voice>

Key <voice> attributes

age

name

American English (en-US) (9 total)

British English (en-GB) (6 total)

Spanish (es-ES) (4 total)

German (de-DE) (5 total)

Canadian French (fr-CA) (3 total)

French (fr-FR) (3 total)

Brazilian Portuguese (pt-BR) (3 total)

Portuguese (pt-PT) (3 total)

Cantonese (zh-HK) (2 total)

Mandarin (zh-CN) (2 total)

Multilingual voices

variant

xml:lang

Additional SSML tag details

Built-in, unique SSML extensions

Vocalizer SSML tag reference table

<audio>

fetchtimeout

maxage

maxstale

fetchhint

<break>

strength

<emphasis>

level

<lexicon>

<meta>

http-equiv

<p>

<phoneme>

<prompt>

<prosody>

<s>

<speak>

`<voice>`

Key `<voice>` attributes

`age`

`name`

`variant`

`xml:lang`

`<audio>`

`fetchtimeout`

`maxage`

`maxstale`

`fetchhint`

`<break>`

`strength`

`<emphasis>`

`level`

`<lexicon>`

`<meta>`

`http-equiv`

`<p>`

`<phoneme>`

`<prompt>`

`<prosody>`

`<s>`

`<speak>`