Search…
Tutorial
VoiceXML 2.0 is the World Wide Web consortium standard for scripting voice applications. In this tutorial, we construct a VoiceXML interactive voice response (IVR) for a customer service center. Some aspects of this tutorial assume you have your own web server. For a full production level application, this is the recommended configuration. Starting from a simple “Hello World” application, we build a telephony application which includes:
  • dynamic response driven by touch tone or speech input
  • advanced text-to-speech (TTS) speech synthesis and automatic speech recognition (ASR)
  • system integration with enterprise databases

Introduction to VoiceXML

We begin with nearly the simplest complete VoiceXML application. The application here is analogous to an answering machine set to play an announcement only.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<block>
5
<prompt>
6
Welcome to Plum Voice.
7
</prompt>
8
</block>
9
</form>
10
</vxml>
Copied!
In this example, the user would hear a synthesized voice say, “Welcome to Plum Voice.” Then the system would simply hang up. The <form> defines the basic unit of interaction in VoiceXML. This form includes only a single <block> of executable content which in turn includes a single <prompt> to the user. By default, any plain text within a prompt is passed to the system's text-to-speech (TTS) synthesis engine to be generated as audio.
Also, as the <?xml?> tag declares, every VoiceXML document is an XML document. The basic structure of the VoiceXML should be familiar to anyone who has looked at HTML web documents. Tags are set off by brackets <form> and are closed with a forward slash </form>. VoiceXML documents must adhere strictly to the XML standard. The document must begin with the <?xml?> tag. Then the rest of the document is enclosed within the <vxml></vxml> tags. Unlike HTML, all tags must be closed and certain special characters must be escaped with a safe alternative. For example, the less than sign <, when it is not used to open a tag, must be escaped with a safe alternative (e.g. &lt;).
For static prompts such as this welcome message, we'll probably want to use a human announcer instead of TTS. TTS has come a long way, but there's still no substitute for the real thing. For recorded prompts, we use the <audio> tag.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<block>
5
<prompt>
6
<audio src="wav/welcome.wav">
7
Welcome to Plum Voice.
8
</audio>
9
</prompt>
10
</block>
11
</form>
12
</vxml>
Copied!
In this case, the source (“src”) reference is relative to the VXML document URL in which it appears. WAV files are a generic container type. WAV files include a header which indicates the actual audio sample size, encoding, and rate used. Supported formats vary by VoiceXML implementation and not all possible WAV file formats are supported. Plum DEV supports 8 kHz audio files in 16 bit linear, 8 bit µ-law (u-law), or 8 bit A-law encoding in WAV files or headerless files.
The text within the audio tag is not required. We could have included no content: <audio src="wav/welcome.wav"/>
which is equivalent to <audio src="wav/welcome.wav"></audio>
The text included within the audio tag in the example above is something like the ALT text for images in HTML. If the platform is unable to open or play the source (“src”) file in the audio tag, it falls back on generating TTS from the included text.
It is good practice to store your audio files on the same local server as your application script. For example, here is what our server files would look like on our local server:
From the screenshot above, note that in the files folder of our local server, test.php is our script that contains the reference to the file, welcome.wav.
welcome.wav is stored in our wav folder. Thus, when referencing the source (“src”) file in our audio tag, we do:
1
<audio src="wav/welcome.wav">
2
Welcome to Plum Voice.
3
</audio>
Copied!
The benefit of storing audio files on your local server as opposed to the audio repository is that it allows for easier file management. Suppose you wanted to change the name of one of your audio files. If this file is stored locally on your server, you could just go in and rename the file yourself. However, with the audio repository, you are not able to manage these files. For example, if you deleted a recording in the audio repository (in this case, let's call it 12.wav) and uploaded a replacement file, the replacement file would not take the deleted recording's old name. It would take the next highest number available out of your recordings (in this case, let's say it got named 21.wav).
If you are concerned about loading times for audio files from your local server, please note that when these audio files have been cached, they will have the same load times as if stored on our audio repository. Please follow the following link for more information about caching.

User Interaction with DTMF

Grammars are used by speech recognizers to determine what the recognizer should listen for, and so describe the utterances a user may say. Starting with VoiceXML Version 2.0, the W3C requires that all VoiceXML platforms must support at least one common format, the XML Form of the W3C Speech Recognition Grammar Specification (SRGS). Plum implements the SRGS+XML grammar format for both Voice and DTMF grammars as well as JSpeech Grammar Format (JSGF). Refer to the W3C Speech Recognition Grammar Specification or the JSGF Specification for further detail.
To control user input, we can explicitly create input fields and specify allowable grammars for user input. We do this by explicitly using the <grammar> tag for each <field> inside a <form>. Please note that the id attribute of the <form> does not allow for any white space. The <grammar> element is used to provide a speech (or DTMF) grammar that:
  • Specifies a set of utterances or DTMF key presses that a user may speak or type to perform an action or supply information.
  • Returns a corresponding semantic interpretation for a matching input.
The following example shows how to set up a grammar for DTMF input from the user:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form id="mainmenu">
4
<field name="menuchoice">
5
<grammar type="application/x-jsgf" mode="dtmf">
6
1|2|3
7
</grammar>
8
<prompt>
9
For sales, press 1.
10
For tech support, press 2.
11
For company directory, press 3.
12
</prompt>
13
<filled>
14
<if cond="menuchoice==1">
15
Welcome to sales.
16
<elseif cond="menuchoice==2"/>
17
Welcome to tech support.
18
<elseif cond="menuchoice==3"/>
19
Welcome to the company directory.
20
</if>
21
</filled>
22
</field>
23
</form>
24
</vxml>
Copied!
Here we specify a grammar for the field using JSGF (Java Speech Grammar Format) grammar syntax which is the default syntax for Plum DEV. To do this example in SRGS+XML format, it would look like this.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form id="mainmenu">
4
<field name="menuchoice">
5
<grammar type="application/srgs+xml" root="ROOT" mode="dtmf">
6
<rule id="ROOT">
7
<one-of>
8
<item>1</item>
9
<item>2</item>
10
<item>3</item>
11
</one-of>
12
</rule>
13
</grammar>
14
<prompt>
15
For sales, press 1.
16
For tech support, press 2.
17
For company directory, press 3.
18
</prompt>
19
<filled>
20
<if cond="menuchoice==1">
21
Welcome to sales.
22
<elseif cond="menuchoice==2"/>
23
Welcome to tech support.
24
<elseif cond="menuchoice==3"/>
25
Welcome to the company directory.
26
</if>
27
</filled>
28
</field>
29
</form>
30
</vxml>
Copied!
From this example, notice that the SRGS+XML grammar in this example is longer than the JSGF grammar in the example before it. For numeric input, JSGF is often a shorter alternative.

User Interaction with Speech

Up to this point, we've restricted our discussion to the use of touch tone (DTMF) input. One of the most compelling reasons to use VoiceXML is the ability to integrate advanced speech recognition technologies simply and portably. Let's use speech instead of DTMF for the JSGF example in Section 1.2.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form id="mainmenu">
4
<field name="menuchoice">
5
<grammar type="application/x-jsgf" mode="voice">
6
one|two|three
7
</grammar>
8
<prompt>
9
For sales, say 1.
10
For tech support, say 2.
11
For company directory, say 3.
12
</prompt>
13
<filled>
14
<if cond="menuchoice=='one'">
15
Welcome to sales.
16
<elseif cond="menuchoice=='two'"/>
17
Welcome to tech support.
18
<elseif cond="menuchoice=='three'"/>
19
Welcome to the company directory.
20
</if>
21
</filled>
22
</field>
23
</form>
24
</vxml>
Copied!
From this example, notice that we set the “mode” attribute in the <grammar> tag to “voice” instead of “dtmf”. Also, note that we have to spell out the numbers “one”, “two”, and “three” for the speech grammar instead of using the Arabic numbers 1, 2, and 3 like we did for the DTMF example. We also have to do this inside of the <if> and <elseif> tags and place single quotes around them since they are strings.

Built-in Grammars

To simplify development there are several base grammars that are built into the system. They can be referenced by name in the “type” attribute of the “field” tag. An example of this would be:
You can use the boolean built-in grammar when expecting an affirmative phrase (such as “yes”) or a negative phrase (such as “no”) in your application. You can also use DTMF for this grammar, where DTMF-1 is affirmative and DTMF-2 is negative. The six other built-in grammars are date, digits, currency, number, phone, and time. Note that phone and time work only for Nuance OSR engines.
Below is an example of how we can use a built-in grammar inside of a <field> tag.
1
2
<?xml version="1.0"?>
3
<vxml version="2.0">
4
<form>
5
<field name="id" type="digits">
6
<prompt>
7
Please say or enter your customer identification number.
8
</prompt>
9
<filled>
10
You entered <value expr="id"/>.
11
<!-- transfer to premium support -->
12
</filled>
13
</field>
14
</form>
15
</vxml>
Copied!
From this example, in the field “id”, we use the built-in grammar “digits” to allow the user to say or enter any amount of digits for the customer identification number. However, if we wanted to specify a certain amount of digits for the customer identification number, we could use the digits?length=n parameter to do this.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt>
6
Enter your seven digit customer
7
identification number.
8
</prompt>
9
<filled>
10
You entered <value expr="id"/>.
11
<!-- transfer to premium support -->
12
</filled>
13
</field>
14
</form>
15
</vxml>
Copied!
Here, the user has to enter 7 digits for their customer identification number due to the parameter. If the user does not enter exactly 7 digits, the system will respond with, “Sorry, I didn't understand you” and re-prompt “Enter your seven digit customer identification number” back to the user. We will find out more about error handling in the next section of the tutorial.

Standard Events

Plum DEV already takes care of trapping and handling some exception conditions such as when the user enters no input or the user enters input not defined in the grammar for an input field. In the next example, we want to collect a seven digit identification number in a field.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt>
6
Enter your customer identification number.
7
</prompt>
8
<filled>
9
<assign name="customerid" expr="id"/>
10
<prompt>
11
You entered <value expr="id"/>.
12
</prompt>
13
<!-- transfer to premium support -->
14
</filled>
15
</field>
16
</form>
17
</vxml>
Copied!
If the user enters nothing for the timeout interval (default is 3 seconds), Plum DEV's built-in exception handling will play the default no input message: “Sorry, I didn't hear you” and then re-prompt the user to enter their customer identification number.
If the user enters input that does not match the defined grammar (in this case 7 digits), Plum DEV's built-in exception handling will play the default no match message: “Sorry, I didn't understand you” and re-prompt the user to enter their customer identification number.
The following example mimics the default behavior of the system. This example behaves identically to the previous example.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt>
6
Enter your customer identification number.
7
</prompt>
8
<filled>
9
<assign name="customerid" expr="id"/>
10
<prompt>
11
You entered <value expr="id"/>.
12
</prompt>
13
<!-- transfer to premium support -->
14
</filled>
15
<noinput>
16
<prompt>
17
Sorry, I didn't hear you.
18
</prompt>
19
<reprompt/>
20
</noinput>
21
<nomatch>
22
<prompt>
23
Sorry, I didn't understand you.
24
</prompt>
25
<reprompt/>
26
</nomatch>
27
</field>
28
</form>
29
</vxml>
Copied!
By defining your own actions for no match and no input events you can greatly increase the control you have over your code. For instance, you could choose to offer a more helpful error message, to not play the original prompt again by omitting the <reprompt/> tag, to play custom messages for the specific occurrence of an exception event, to execute script code, or to abandon the effort altogether by moving on to a new form using <goto>.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt>
6
Enter your customer identification number.
7
</prompt>
8
<filled>
9
<assign name="customerid" expr="id"/>
10
<prompt>
11
You entered <value expr="id"/>.
12
</prompt>
13
<!-- transfer to premium support -->
14
</filled>
15
<noinput count="1">
16
<!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
17
<prompt>
18
Your identification number is the seven digit number on the front of your membership card.
19
</prompt>
20
<reprompt/>
21
</noinput>
22
<noinput count="3">
23
<prompt>
24
It seems you are having difficulty with your identification number, we will transfer you to customer service.
25
</prompt>
26
<!-- transfer caller to customer service -->
27
</noinput>
28
<nomatch count="1">
29
<assign name="badid" expr="id"/>
30
<prompt>
31
Your identification number must be seven digits. Please try again.
32
</prompt>
33
</nomatch>
34
<nomatch count="3">
35
<prompt>
36
It seems you are having difficulty with your identification number, we will transfer you to customer service.
37
</prompt>
38
<!-- transfer caller to customer service -->
39
</nomatch>
40
</field>
41
</form>
42
</vxml>
Copied!
In the above example, we are defining custom exceptions for the first two occurrences of the nomatch and noinput events, as well as separate exceptions for the third occurrence of each nomatch and noinput event.
It is good practice to include the “count” attribute when defining exception events to avoid infinite loops and to increase the customer experience, for instance by helpfully transferring the user to customer service when they are having difficulty entering input. Remember that count = 1 will execute for both 1 & 2 because, the FIA looks at all of the matching <catch> elements and finds the highest count value that is less than or equal to the current count.
If you want to specify different nomatch prompts for each invalid try you would set the count to specific consecutive numbers, this is shown in the code below:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt count="1">
6
Enter your customer identification number.
7
</prompt>
8
<prompt count="2">
9
Enter your seven digit customer identification
10
number.
11
</prompt>
12
<prompt count="3">
13
Your customer identification number can be
14
found on the front of your membership card.
15
Enter your seven digit customer identification
16
number.
17
</prompt>
18
<filled>
19
<assign name="customerid" expr="id"/>
20
<prompt>
21
You entered <value expr="customerid"/>.
22
</prompt>
23
<!-- transfer to premium support -->
24
</filled>
25
<catch event="nomatch noinput" count="1">
26
<prompt>
27
Your input was not valid.
28
</prompt>
29
<reprompt/>
30
</catch>
31
<catch event="nomatch noinput" count="2">
32
<prompt>
33
Your input was not valid please try one last time.
34
</prompt>
35
<reprompt/>
36
</catch>
37
<catch event="nomatch noinput" count="3">
38
<prompt>
39
It seems you are having difficulty with your identification number, we will transfer you to customer service.
40
</prompt>
41
<!-- transfer caller to customer service -->
42
</catch>
43
</field>
44
</form>
45
</vxml>
Copied!
Note that if your counter exceeds the maximum count for your defined exception events, the highest event will be throw. In the previous example, we would continue to hit the count=”3” event once we have exceeded three nomatch or noinput events.
The nomatch and noinput events are shorthand for the generic <catch> event handler. You may have noticed above that we defined the same event for the third occurrence of both the noinput and nomatch events. We could consolidate the above example to use the same actions for both nomatch and noinput events as such:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt>
6
Enter your customer identification number.
7
</prompt>
8
<filled>
9
<assign name="customerid" expr="id"/>
10
<prompt>
11
You entered <value expr="id"/>.
12
</prompt>
13
<!-- transfer to premium support -->
14
</filled>
15
<catch event="nomatch noinput" count="1">
16
<!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
17
<prompt>
18
Your input was not valid. Your identification number is the seven digit number on the front of your membership card.
19
</prompt>
20
<reprompt/>
21
</catch>
22
<catch event="nomatch noinput" count="3">
23
<prompt>
24
It seems you are having difficulty with your identification number, we will transfer you to customer service.
25
</prompt>
26
<!-- transfer caller to customer service -->
27
</catch>
28
</field>
29
</form>
30
</vxml>
Copied!
Also, as we have just done with the <catch> tag, rather than simply repeating the same prompts to the user, we can offer increasingly detailed prompt messages by using the prompt “count” attribute.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits?length=7">
5
<prompt count="1">
6
Enter your customer identification number.
7
</prompt>
8
<prompt count="2">
9
Enter your seven digit customer identification
10
number.
11
</prompt>
12
<prompt count="3">
13
Your customer identification number can be
14
found on the front of your membership card.
15
Enter your seven digit customer identification
16
number.
17
</prompt>
18
<filled>
19
<assign name="customerid" expr="id"/>
20
<prompt>
21
You entered <value expr="customerid"/>.
22
</prompt>
23
<!-- transfer to premium support -->
24
</filled>
25
<catch event="nomatch noinput" count="1">
26
<!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
27
<prompt>
28
Your input was not valid.
29
</prompt>
30
<reprompt/>
31
</catch>
32
<catch event="nomatch noinput" count="3">
33
<prompt>
34
It seems you are having difficulty with your identification number, we will transfer you to customer service.
35
</prompt>
36
<!-- transfer caller to customer service -->
37
</catch>
38
</field>
39
</form>
40
</vxml>
Copied!
The user interaction might sound like this:
C: Enter your customer identification number. <prompt counter = 1> H: <enters 1 2 3> C: Your input was not valid. Enter your seven digit customer identification number. <prompt counter = 2> H: <enters 1 4 5> C: Your input was not valid. Your customer identification number can be found on the front of your membership card. Enter your seven digit customer identification number. <prompt counter = 3> H: <enters 1 2 3 4 5 6> C: It seems you are having difficulty with your identification number, we will transfer you to customer service. <transfer user to customer service>
See the Plum DEV Reference Manual for other standard VoiceXML events.

ECMAScript Input Validation

Plum DEV has a fully functioning JavaScript engine, similar to a standard web browser. This allows us to define functions and make use of all of the features that the JavaScript language has to offer.
Now, suppose we have a field that checks the length of a customer identification number.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form>
4
<field name="id" type="digits">
5
<prompt>
6
Please enter your customer identification number.
7
</prompt>
8
<filled>
9
<if cond="id.length == 7">
10
<assign name="customerid" expr="id"/>
11
You entered <value expr="id"/>.
12
<!-- transfer to premium support -->
13
<else/>
14
Invalid ID number. Please check the number
15
and try again.
16
<clear namelist="id"/>
17
<reprompt/>
18
</if>
19
</filled>
20
</field>
21
</form>
22
</vxml>
Copied!
In this example, “id” is an ECMAScript variable that is set when the caller enters a customer identification number. In the if conditional of the filled block, the contents of the “cond” attribute, “id.length == 7”, are evaluated as an ECMAScript expression. If the user entered a 7-digit number, the if conditional would be true and a new variable, “customerid”, would be assigned the value of “id”. Finally, the <value expr/> expression converts the contents of “id” into a string for playback and the application states what the user entered for their “id”. If the user did not enter 7 digits, the if conditional would be false and go to the else conditional. The application would say to the user, “Invalid ID number. Please check the number and try again.” The “id” variable would be cleared with the <clear> tag and the application would re-prompt the user for their customer identification number.

Navigation

To navigate your way through forms and fields in your application, you can use the <goto> tag. The next example demonstrates how we can use the <goto> tag to navigate our way through an application:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
<form id="firstform">
4
<block>
5
<prompt>
6
Going to the next form.
7
</prompt>
8
<!-- A "#" symbol followed by an identifier specifies a -->
9
<!-- form or menu ID to jump to. -->
10
<goto next="#nextform"/>
11
</block>
12
</form>
13
<form id="nextform">
14
<block>
15
<prompt>
16
Welcome to the next form. Goodbye.
17
</prompt>
18
</block>
19
</form>
20
</vxml>
Copied!
Here, the “next” attribute in the <goto> tag brings you to the form “nextform”. If your application had multiple forms, you could use the “next” attribute in the <goto> tag to bring you to any one of these forms, as long as you specify the “id” for that form.
The <menu> tag is a convenient way for you to create a single anonymous field in your application that prompts the user to make a choice and then transitions to a different place in your application based on the user's choice. Let's look at an example that navigates through the application using the <menu> and <choice> tag.
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
4
<form>
5
<block>
6
<prompt>
7
Welcome to Plum Voice.
8
</prompt>
9
<goto next="#mainmenu"/>
10
</block>
11
</form>
12
13
<menu id="mainmenu">
14
<prompt>
15
For sales, press 1 or say sales.
16
For tech support, press 2 or say support.
17
For company directory, press 3 or say directory.
18
</prompt>
19
<choice dtmf="1" next="#sales">
20
Sales</choice>
21
<choice dtmf="2" next="#support">
22
Tech Support</choice>
23
<choice dtmf="3" next="#directory">
24
Company Directory</choice>
25
</menu>
26
27
<form id="sales">
28
<block>
29
Please hold for the next available sales
30
representative.
31
<!-- transfer to sales -->
32
</block>
33
</form>
34
35
<form id="support">
36
<block>
37
<!-- transfer to tech support -->
38
</block>
39
</form>
40
41
<form id="directory">
42
<block>
43
<!-- transfer to company directory -->
44
</block>
45
</form>
46
47
</vxml>
Copied!
From this example, note that in the first <form> block, the “next” attribute of the <goto> tag points to “mainmenu”, which is specified as the “id” of the <menu> tag. Inside this <menu> tag, there are 3 <choice> tags that bring the user to their specified choice based on their input of 1, 2, or 3. Notice that these <choice> tags also use the “next” attribute to point to an “id” of a <form>. So, if the user enters DTMF-1, the application will go to the <form> block with an “id” of “sales”. If the user enters DTMF-2, the application will go to the <form> block with an “id” of “support”. If the user enters DTMF-3, the application will go to the <form> block with an “id” of “directory”.
Not only can you use the <goto> tag to navigate within your document, you can also use it to navigate through multiple VoiceXML documents.
1
firstdocument.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
<form>
6
<block>
7
<prompt>
8
Hello World!
9
</prompt>
10
<goto next="nextdocument.vxml"/>
11
</block>
12
</form>
13
</vxml>
Copied!
1
nextdocument.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
<form>
6
<block>
7
<prompt>
8
Goodbye World!
9
</prompt>
10
</block>
11
</form>
12
</vxml>
Copied!
From this example, the application first greets the user with “Hello World!” and then uses the <goto> tag to transition to the VoiceXML document, “nextdocument.vxml”. In “nextdocument.vxml”, the application says “Goodbye World!” and then the application ends. Keep in mind that when you use the <goto> tag to transition to another VoiceXML document, data that was collected in your original document is lost when the new document is executed.

Sending Data

To exchange data between the VoiceXML platform and an application server, you can use the <submit>, <subdialog>, or <data> tag. The interaction between the platform and application server is a series of HTTP GETs or HTTP POSTs where the application server processes these requests and returns valid VoiceXML.
When using the <submit> tag, it completes a GET or POST that will trigger a page transition. Once the GET or POST is complete, the new document will be parsed and executed and the previous VoiceXML document will be discarded. For example, you can collect information through document level variables and send those variables through the <submit> tag to your application server script.
1
collectinfo.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
6
<form>
7
<field name="customerid" type="digits">
8
<prompt>
9
Please enter your customer identification number using your keypad.
10
</prompt>
11
</field>
12
13
<field name="age" type="digits?minlength=1;maxlength=2">
14
<prompt>
15
Please enter your age using your keypad.
16
</prompt>
17
</field>
18
19
<block>
20
<prompt>
21
Please wait while we process your information.
22
</prompt>
23
<submit namelist="customerid age" next="http://mightyserver.com/submit.php"/>
24
</block>
25
</form>
26
27
</vxml>
Copied!
1
submit.php:
2
3
<?php
4
header("Content-type: text/xml");
5
echo("<?xml version=\"1.0\"?>\n");
6
7
$customerid = $_GET[customerid];
8
$age = $_GET[age];
9
?>
10
11
<vxml version="2.0">
12
<form>
13
<block>
14
<prompt>
15
Your customer identification number is <?php echo($customerid)?>.
16
</prompt>
17
<prompt>
18
Your age is <?php echo($age)?>.
19
</prompt>
20
</block>
21
</form>
22
</vxml>
Copied!
From this example, we gather information from the user through the variables “customerid” and “age”. We then send these variables to “submit.php” by using the <submit> tag. This then causes a page transition from “collectinfo.vxml” to “submit.php”. In “submit.php”, we use $_GET to assign the variables, $customerid and $age, the values of “customerid” and “age” from “collectinfo.vxml”. From here, the application states the customer identification number and age back to the user.
Similarly, you can also use the <subdialog> tag to exchange data between Plum DEV and application server. The <subdialog> tag is similar to a function call executed by a GET or POST. It allows you to execute a new VoiceXML document in a new context, but once the <subdialog> is complete, control will be returned to the parent document at the same location that the subdialog was called.
1
subdialog.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.1">
5
<form>
6
<subdialog name="info" src="http://mightyserver.com/collectinfo.vxml"/>
7
<block>
8
<prompt>
9
Your customer identification number is <value expr="info.customerid"/>.
10
</prompt>
11
<prompt>
12
Your age is <value expr="info.age"/>.
13
</prompt>
14
</block>
15
</form>
16
</vxml>
Copied!
1
collectinfo.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
6
<form>
7
<field name="customerid" type="digits">
8
<prompt>
9
Please enter your customer identification number using your keypad.
10
</prompt>
11
</field>
12
13
<field name="age" type="digits?minlength=1;maxlength=2">
14
<prompt>
15
Please enter your age using your keypad.
16
</prompt>
17
</field>
18
19
<block>
20
<prompt>
21
Please wait while we process your information.
22
</prompt>
23
<return namelist="customerid age"/>
24
</block>
25
</form>
26
27
</vxml>
Copied!
From this example, notice how we use the <subdialog> tag to transition to “collectinfo.vxml” Once we're in “collectinfo.vxml”, we collect the “customerid” and “age” from the user and return these field variables back to “subdialog.vxml” by using the <return> tag. Once we're back in “subdialog.vxml”, we can refer to “customerid” and “age” by simply adding a “.” and then the name of the variable following “info”. So, the customer identification number would be referred as “info.customerid” and age would be referred as “info.age” in the original document.
For the <data> tag, it differs from the <subdialog> tag in that it does not execute a remote VoiceXML document, but instead expects the remote application to return an XML result. This XML file is then mapped directly into an ECMAScript DOM object for Plum DEV to reference as a variable.
1
collectinfo.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
6
<form>
7
<field name="agentid" type="digits">
8
<prompt>
9
Please enter your agent number using your keypad.
10
</prompt>
11
</field>
12
13
<field name="age" type="digits?minlength=1;maxlength=2">
14
<prompt>
15
Please enter your age using your keypad.
16
</prompt>
17
</field>
18
19
<block>
20
<data name="verification" namelist="agentid age"
21
src="http://mightyserver.com/verification.xml"/>
22
<prompt>
23
Welcome, <value expr="verification.documentElement.firstChild.toString()"/>.
24
</prompt>
25
<prompt>
26
The agent number you entered is <value expr="agentid"/>.
27
</prompt>
28
<prompt>
29
The age you entered is <value expr="age"/>.
30
</prompt>
31
</block>
32
</form>
33
34
</vxml>
Copied!
1
verification.xml:
2
3
<?xml version="1.0"?>
4
<name>Mister Bond</name>
Copied!
From this example, the <data> tag references to “verification.xml”, which returns valid XML back to Plum DEV. (A small note on XML: You can create and name a tag anything you want, as long as the tag is closed off properly within the XML document.) To reference this XML in “collectinfo.vxml”, you would have to add “.documentElement.firstChild.toString()” to the end of the name that you specified in the <data> tag. So, in this case, to reference “Mister Bond” from “verification.xml”, you would refer to it as “verification.documentElement.firstChild.toString()” in “collectinfo.vxml”.

Recording User Input

To record audio from the user, you would use the <record> tag. The <record> tag is an input item that collects a recording from the user. A reference to the recorded audio is stored in the input item variable, which can be played back (using the expr attribute for the <value> tag) or submitted to a server (using the <submit> tag).
The following is a short example that demonstrates how to use the <record> tag:
1
record.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
<form>
6
<record name="myrecording" type="audio/x-wav" beep="true">
7
<prompt>
8
Please record a message after the beep.
9
</prompt>
10
11
<filled>
12
You just recorded the following message:
13
<value expr="myrecording"/>
14
<submit next="submitrecording.php" namelist="myrecording"
15
method="post" enctype="multipart/form-data"/>
16
</filled>
17
</record>
18
</form>
19
</vxml>
Copied!
1
submitrecording.php:
2
3
<?php
4
header("Content-type: text/xml");
5
echo("<?xml version=\"1.0\"?>\n");
6
?>
7
8
<vxml version="2.0">
9
<form>
10
<block>
11
12
<?php
13
if (isset($_FILES['myrecording']) && is_uploaded_file($_FILES['myrecording']['tmp_name'])) {
14
move_uploaded_file($_FILES['myrecording']['tmp_name'],"message.wav");
15
echo "<prompt bargein=\"false\">Audio saved.</prompt>";
16
} else {
17
echo "<prompt bargein=\"false\">Audio not saved.</prompt>";
18
}
19
?>
20
21
</block>
22
</form>
23
</vxml>
Copied!
From this example, the user is prompted to record a message after the beep. Notice that the attribute, “beep”, is set to true so that the user hears the beep. Once the user finishes recording, the application repeats the recording back to the user and then submits the recording, “myrecording”, to submitrecording.php through the “namelist” attribute of the <submit> tag. In submitrecording.php, the file gets uploaded and stored with the name “message.wav”. If this is successful, the user hears, “Audio saved.” If it was not successful, the user hears, “Audio not saved.” Note that the directory in which the audio recording file is being saved must have the appropriate permissions set to allow the creation of this new audio file.
There are also numerous attribues for the <record> tag that can help you adjust recording settings for the user. For example, to adjust the maximum duration of time for the user, you would use the attribute, “maxtime”. This maximum duration has a limit of 1 hour. To adjust the amount of time the user has before the recording stops, you would use the attribute, “finalsilence”. The maximum “finalsilence” time is 5 minutes. If you want the user to be able to terminate the recording by pressing any DTMF key, you could use the attribute, “dtmfterm”.
The following example demonstrates how to use these attributes in your application:
1
recordattributes.vxml:
2
3
<?xml version="1.0"?>
4
<vxml version="2.0">
5
<form>
6
<record name="myrecording" maxtime="300s"
7
finalsilence="30s" dtmfterm="true">
8
<prompt>
9
Please say any comments you might have about the class.
10
Press any DTMF key when you are finished recording.
11
</prompt>
12
13
<filled>
14
You just recorded the following:
15
<value expr="myrecording"/>
16
</filled>
17
</record>
18
</form>
19
</vxml>
Copied!
From this example, the user has a maximum time of 5 minutes to record a message. If the user stops speaking for a moment during the recording, “finalsilence” gives the user 30 seconds to say something to continue the recording; otherwise, the recording is terminated. Finally, by setting “dtmfterm” to true in this example, the user can press any DTMF key to end the recording.
You can also set the audio file type of the recording by using the “type” attribute. The audio format can be set to 1 of 3 options: audio/basic (which is 8 kHz 8-bit u-law encoded headerless (*.ul)), audio/x-alaw-basic (which is 8 kHz 8-bit a-law encoded headerless (*.al)), or audio/x-wav (which is 8 kHz 8-bit u-law WAV (*.wav)). If you wanted a different audio format from these 3 options, you could use an audio conversion tool such as SOX or Adobe Audition to reformat your files afterwards.
Finally, keep in mind that when you use the <record> tag, the recording is not stored on our servers. The recording can only be played back by means of the “expr” attribute of the <value> tag or submitted to your server to be stored by means of the <submit> tag.

Tuning Application Behavior

To tune the behavior of the application, you can use the <property> tag. The <property> element sets a property value. Properties are used to set values that affect platform behavior, such as the recognition process, timeouts, caching policy, etc.
Properties may be defined for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Properties apply to their parent element and all the descendants of the parent. A property at a lower level overrides a property at a higher level. When different values for a property are specified at the same level, the last one in document order applies.
One property we can use to tune our application is the “inputmodes” property. This property can be set to understand just speech input from the user or just dtmf input from the user. To set “inputmodes” to understand just speech input, set the “value” attribute of the property to “voice”. To set “inputmodes” to understand just dtmf input, set the “value” attribute of the property to “dtmf”. Please note that when using the “inputmodes” property, you will need to be mindful of using various “inputmodes” properties within your code, since the platform's behavior queues all properties at once up to the first non-bargeable prompt and/or first input element.
The following example demonstrates how we would use “inputmodes” to just understand dtmf input from the user:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
4
<property name="inputmodes" value="dtmf"/>
5
<property name="interdigittimeout" value="3s"/>
6
7
<form>
8
<field name="myfield" type="digits">
9
<prompt>
10
You will only be able to enter digits.
11
Enter a number on your keypad.
12
</prompt>
13
<filled>
14
You entered <value expr="myfield"/>.
15
</filled>
16
<nomatch>
17
You did not enter a number properly.
18
<reprompt/>
19
</nomatch>
20
<noinput>
21
You did not enter anything.
22
<reprompt/>
23
</noinput>
24
</field>
25
</form>
26
27
</vxml>
Copied!
A possible user interaction might be:
C: You will only be able to enter digits. H: Twelve. C: (ignores spoken input) Enter a number on your keypad. H: (enters DTMF-1 DTMF-2) C: You entered twelve.
To tune properties for recognizing incoming speech, we can use the “confidencelevel” property. This property adjusts the confidence needed for a recognition. For example:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
4
<property name="confidencelevel" value="0.75"/>
5
6
<form>
7
<field type="boolean">
8
<prompt>
9
Please say yes or no.
10
</prompt>
11
</field>
12
</form>
13
14
</vxml>
Copied!
From this example, the threshold of the confidence level is raised to 0.75, requiring a clear response of a “yes” or “no” answer. Using a high confidence level setting is useful for when you are expecting a precise match to your grammar.
However, for grammars with multiple possibilities for matches such as a database of first and last names, you would want to adjust the confidence level to allow the system to more broadly match what the user is saying.
To tune properties for prompting and collecting, the “bargein” property can be used to prevent users from interrupting speech. Here is an example that does not allow the user to interrupt for the first prompt, but does allow the user to interrupt for the second prompt:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
4
<form>
5
<property name="bargein" value="false"/>
6
<field name="myfield">
7
<grammar type="application/x-jsgf" mode="voice">
8
( one | two )+
9
</grammar>
10
<prompt>
11
You must listen to this message.
12
</prompt>
13
<prompt bargein="true">
14
Say any number of the digits one or two.
15
</prompt>
16
<filled>
17
You said <value expr="myfield"/>.
18
</filled>
19
<nomatch>
20
You did not say any ones or twos.
21
<reprompt/>
22
</nomatch>
23
<noinput>
24
You did not say anything.
25
<reprompt/>
26
</noinput>
27
</field>
28
</form>
29
30
</vxml>
Copied!
So, a possible user interaction might be:
C: You must… H: One two. C: …listen to this message. C: Say any number of… H: One two. C: You said one two.
Since the original value of “bargein” is set to false, the user is not allowed to interrupt the first message. When “bargein” is set to true, the user is allowed to interrupt the message by saying 1 or 2.
Another property that can be used to adjust prompting and collecting is the “timeout” property. This value can be adjusted to allow for more time for a speech or dtmf input from the user. For example:
1
<?xml version="1.0"?>
2
<vxml version="2.0">
3
4
<form>
5
<property name="timeout" value="7s"/>
6
<field name="myfield">
7
<grammar type="application/x-jsgf" mode="voice">
8
( one | two )+
9
</grammar>
10
<grammar type="application/x-jsgf" mode="dtmf">
11
( 1 | 2 )+
12
</grammar>
13
<prompt>
14
Say or enter any number of the digits one or two.
15
</prompt>
16
<filled>
17
You entered <value expr="myfield"/>.
18
</filled>
19
<nomatch>
20
You did not enter any ones or twos.
21
<reprompt/>
22
</nomatch>
23
<noinput>
24
You did not enter anything.
25
<reprompt/>
26