Vocalizer 7: <voice> tag and SSML Support

For Vocalizer 7

BETA FEATURE:

The Vocalizer 7 TTS engine is currently available as a beta feature. Further testing and updates may take place while Vocalizer 7 is in beta.

We encourage you to share feedback at beta@plumgroup.com.

IMPORTANT! Treat the terms "tags" and "elements" as the same.

The term "tags" used here means the same thing as the term "elements" used in other VXML-related documentation (e.g., the W3C SSML 1.0 Recommendation).

Summary

This page covers the <voice> tag in Vocalizer 7 and other SSML tag and attribute features.

<voice>

NOTE: All <voice> attributes are optional. However, an error will occur if no attribute is specified when using the <voice> tag.

The <voice> tag should be used to specify the desired voice through the name attribute.

Example: <voice name="Allison">

See the list of available voices in the name section below.

Key <voice> attributes

age

This attribute is supported.

Notes: The age attribute is only useful with the installation of a set of custom voices with varying age over the same language and gender.

name

This attribute is supported.

Notes: See the following language tables for available names:

American English (en-US) (9 total)

British English (en-GB) (6 total)

Spanish (es-ES) (4 total)

German (de-DE) (5 total)

Canadian French (fr-CA) (3 total)

French (fr-FR) (3 total)

Brazilian Portuguese (pt-BR) (3 total)

Portuguese (pt-PT) (3 total)

Cantonese (zh-HK) (2 total)

Mandarin (zh-CN) (2 total)

Multilingual voices

If selecting a multilingual voice, you can specify one of the voice's other languages as follows:

  1. Specify a voice with <voice> and name, then

  2. Specify one of the voice's other languages with the <speak> and the xml:lang attribute.

Example:

<speak xml:lang="fr-CA">
<voice name="Ava-ml">

About xml:lang

Vocalizer 7 does not accept the xml:lang attribute for the <voice> tag. If used, the TTS engine will use a default voice instead of the one specified.

We recommend using the <speak> or <vxml> tags with the xml:lang attribute instead. See Recommended best practices for more info.

variant

This attribute is not supported.

xml:lang

This attribute is not supported.

Notes: Avoid using the xml:lang attribute in the <voice> tag when setting the TTS language. Vocalizer 7 does not accept it. It will cause the TTS engine to use the default voice instead of the one specified.

Instead, use the xml:lang attribute of the <speak> or <vxml> tags. See Recommended best practices for more info.

Additional SSML tag details

Except where noted below, Vocalizer 7 supports all SSML tags and attributes as described in the W3C SSML 1.0 Recommendation.

This section focuses on exceptions and additional details specific to Vocalizer, including the following:

  • Vocalizer-specific tag and/or attribute behavior.

  • which tags and/or attributes are unsupported.

  • additional Vocalizer-specific features like SSML extensions.

Built-in, unique SSML extensions

Vocalizer 7 features the following added extensions to SSML:

  • <audio>: Supports four (4) additional attributes to manage internet fetching:

    • fetchtimeout: Time to attempt to open and read the audio document.

    • maxage: Value for the HTTP 1.1 cache-control max-age directive.

    • maxstale: Value for the HTTP 1.1 ache-control max-stale directive.

    • fetchhint: Specify "prefetch" to allow prefetching the audio content or "safe" (the default) to follow HTTP 1.1 caching semantics.

  • <phoneme>: Supports specifying L&H+ phoneme strings and phoneme strings in the IPA alphabet.

  • <speak>, <s>, and <p>: Each tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain.

  • <prompt>: Supports specifying ActivePrompt IDs.

For more details on these extensions, see the relevant tag's sections below.

Vocalizer SSML tag reference table

The following table summarizes details about SSML tags described in this section.

<audio>

Vocalizer 7 SSML extensions: Supports four Vocalizer-added attributes to manage internet fetching:

fetchtimeout

The time to attempt to open and read the audio document. The value must be an unsigned integer with a mandatory suffix: "s" for seconds, "ms" for milliseconds.

Example: "3s", "400ms"

maxage

Value for the HTTP 1.1 cache-control max-age directive. This specifies the application is willing to accept a cached copy of the audio document no older than this value.

In most cases, this attribute should not be present, thus allowing the origin server to control cache expiration.

The value must be an unsigned integer to specify the number of seconds. The value must have no suffix (e.g., no "s" or "ms" included). A value of 0 may be used to force re-validating the cached copy with the origin server.

maxstale

Value for the HTTP 1.1 cache-control max-stale directive. This specifies the client is willing to accept a cached copy that is expired by up to this value past the expiration time specified by the origin server.

The value must be an unsigned integer to specify the number of seconds. The value must have no suffix (e.g., no "s" or "ms" included).

fetchhint

"prefetch" to allow prefetching the audio content, "safe" (the default) to follow HTTP 1.1 caching semantics.

Vocalizer allows this attribute, but currently does not behave differently for "prefetch" mode.

<break>

strength

This attribute is supported.

Notes:

  • The following table describes the pause duration for each strength value:

  • *The <break strength="none"> setting only has an audible effect when the TTS engine would have inserted a sentence break without an explicit tag.

  • When using both the time and the strength attributes, the time attribute gets precedence.

<emphasis>

level

This attribute is supported.

Notes: The level="none" setting is not supported. Vocalizer may ignore it to produce optimal natural speech output.

<lexicon>

Notes:

  • Vocalizer supports loading user dictionaries, user rulesets, and ActivePrompt databases through this element.

  • Vocalizer parses all elements and loads tuning data before starting text to speech conversion. This tuning data is unloaded when the last sample buffer is generated, or when the TTS process is stopped, so elements only affect the current synthesis request.

  • Refer to the W3C SSML 1.0 specification for more information.

type

This attribute is supported and optional.

Notes: If used, the type attribute overrides the MIME content type returned by the web server (for http access) or extension mapping rules (for local file access)

Valid type values are as follows:

  • application/edct-bin-dictionary for a Vocalizer binary format user dictionary.

  • application/x-vocalizer-rettt+text for a text user ruleset.

  • application/x-vocalizer-rettt+bin for a binary user ruleset.

  • application/x-vocalizer-activeprompt-db for an ActivePrompt database, optionally with ":mode=automatic" appended to override its default matching mode to automatic.

<meta>

http-equiv

This attribute is not supported.

<p>

Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.

<phoneme>

Vocalizer 7 SSML extensions: This tag supports specifying L&H+ phoneme strings when the alphabet attribute is set to "x-l&h+", and phoneme strings in the IPA alphabet when the alphabet attribute is set to "ipa".

Note that the ampersand is a reserved XML character, so in an SSML document, the L&H+ alphabet needs to be specified with alphabet="x-l&amp;h+". Phoneme strings in the IPA alphabet should also use the necessary escape characters, as they cannot be expressed otherwise.

<prompt>

Notes: When using multiple <prompt> tags with the bargein attribute, prompt queueing and playback will function differently depending on whether bargein is allowed or disallowed.

See Audio Formats and Prompts for details.

Vocalizer 7 SSML extensions: This tag supports specifying ActivePrompt IDs, equivalent to the <ESC>\domain\native control sequence. The "id" attribute is required, and specifies the ActivePrompt in <domain>:<prompt> format.

The content of the element specifies fallback text that is only spoken if the ActivePrompt cannot be found (similar to SSML <audio>).

<prosody>

Notes: The duration, pitch, pitch-range, and contour attributes are ignored.

volume

This attribute is supported.

Notes: The default value for this attribute is 100. The scale is amplitude linear. Although SSML specifies a range of 0-100, Vocalizer extends this range to 200 with the values above 100 reached via relative changes or the values "loud" and "x-loud".

The following table maps SSML symbolic values to SSML volume values and actual output.

This second, broader table maps SSML volume values to Vocalizer volume values:

<s>

Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.

<speak>

Vocalizer 7 SSML extensions: This tag supports an optional ssft-domaintype attribute for activating an ActivePrompt domain, equivalent to the <ESC>\domain\native control sequence. The attribute value is the ActivePrompt domain name.

Last updated