Tutorial

VoiceXML 2.0 is the World Wide Web consortium standard for scripting voice applications. In this tutorial, we construct a VoiceXML interactive voice response (IVR) for a customer service center. Some aspects of this tutorial assume you have your own web server. For a full production level application, this is the recommended configuration. Starting from a simple “Hello World” application, we build a telephony application which includes:

  • dynamic response driven by touch tone or speech input

  • advanced text-to-speech (TTS) speech synthesis and automatic speech recognition (ASR)

  • system integration with enterprise databases

Introduction to VoiceXML

We begin with nearly the simplest complete VoiceXML application. The application here is analogous to an answering machine set to play an announcement only.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <block>
      <prompt>
         Welcome to Plum Voice.
      </prompt>
    </block>
  </form>
</vxml>

In this example, the user would hear a synthesized voice say, “Welcome to Plum Voice.” Then the system would simply hang up. The <form> defines the basic unit of interaction in VoiceXML. This form includes only a single <block> of executable content which in turn includes a single <prompt> to the user. By default, any plain text within a prompt is passed to the system's text-to-speech (TTS) synthesis engine to be generated as audio.

Also, as the <?xml?> tag declares, every VoiceXML document is an XML document. The basic structure of the VoiceXML should be familiar to anyone who has looked at HTML web documents. Tags are set off by brackets <form> and are closed with a forward slash </form>. VoiceXML documents must adhere strictly to the XML standard. The document must begin with the <?xml?> tag. Then the rest of the document is enclosed within the <vxml></vxml> tags. Unlike HTML, all tags must be closed and certain special characters must be escaped with a safe alternative. For example, the less than sign <, when it is not used to open a tag, must be escaped with a safe alternative (e.g. &lt;).

For static prompts such as this welcome message, we'll probably want to use a human announcer instead of TTS. TTS has come a long way, but there's still no substitute for the real thing. For recorded prompts, we use the <audio> tag.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <block>
      <prompt>
        <audio src="wav/welcome.wav">
          Welcome to Plum Voice.
        </audio>
      </prompt>
    </block>
  </form>
</vxml>

In this case, the source (“src”) reference is relative to the VXML document URL in which it appears. WAV files are a generic container type. WAV files include a header which indicates the actual audio sample size, encoding, and rate used. Supported formats vary by VoiceXML implementation and not all possible WAV file formats are supported. Plum DEV supports 8 kHz audio files in 16 bit linear, 8 bit µ-law (u-law), or 8 bit A-law encoding in WAV files or headerless files.

The text within the audio tag is not required. We could have included no content: <audio src="wav/welcome.wav"/>

which is equivalent to <audio src="wav/welcome.wav"></audio>

The text included within the audio tag in the example above is something like the ALT text for images in HTML. If the platform is unable to open or play the source (“src”) file in the audio tag, it falls back on generating TTS from the included text.

It is good practice to store your audio files on the same local server as your application script. For example, here is what our server files would look like on our local server:

From the screenshot above, note that in the files folder of our local server, test.php is our script that contains the reference to the file, welcome.wav.

welcome.wav is stored in our wav folder. Thus, when referencing the source (“src”) file in our audio tag, we do:

<audio src="wav/welcome.wav">
  Welcome to Plum Voice.
</audio>

The benefit of storing audio files on your local server as opposed to the audio repository is that it allows for easier file management. Suppose you wanted to change the name of one of your audio files. If this file is stored locally on your server, you could just go in and rename the file yourself. However, with the audio repository, you are not able to manage these files. For example, if you deleted a recording in the audio repository (in this case, let's call it 12.wav) and uploaded a replacement file, the replacement file would not take the deleted recording's old name. It would take the next highest number available out of your recordings (in this case, let's say it got named 21.wav).

If you are concerned about loading times for audio files from your local server, please note that when these audio files have been cached, they will have the same load times as if stored on our audio repository. Please follow the following link for more information about caching.

User Interaction with DTMF

Grammars are used by speech recognizers to determine what the recognizer should listen for, and so describe the utterances a user may say. Starting with VoiceXML Version 2.0, the W3C requires that all VoiceXML platforms must support at least one common format, the XML Form of the W3C Speech Recognition Grammar Specification (SRGS). Plum implements the SRGS+XML grammar format for both Voice and DTMF grammars as well as JSpeech Grammar Format (JSGF). Refer to the W3C Speech Recognition Grammar Specification or the JSGF Specification for further detail.

To control user input, we can explicitly create input fields and specify allowable grammars for user input. We do this by explicitly using the <grammar> tag for each <field> inside a <form>. Please note that the id attribute of the <form> does not allow for any white space. The <grammar> element is used to provide a speech (or DTMF) grammar that:

  • Specifies a set of utterances or DTMF key presses that a user may speak or type to perform an action or supply information.

  • Returns a corresponding semantic interpretation for a matching input.

The following example shows how to set up a grammar for DTMF input from the user:

<?xml version="1.0"?>
<vxml version="2.0">
<form id="mainmenu">
  <field name="menuchoice">
    <grammar type="application/x-jsgf" mode="dtmf">
      1|2|3
    </grammar>
      <prompt>
        For sales, press 1.
        For tech support, press 2.
        For company directory, press 3.
      </prompt>
      <filled>
        <if cond="menuchoice==1">
          Welcome to sales.
        <elseif cond="menuchoice==2"/>
          Welcome to tech support.
        <elseif cond="menuchoice==3"/>
          Welcome to the company directory.
        </if>
      </filled>
  </field>
</form>
</vxml>

Here we specify a grammar for the field using JSGF (Java Speech Grammar Format) grammar syntax which is the default syntax for Plum DEV. To do this example in SRGS+XML format, it would look like this.

<?xml version="1.0"?>
<vxml version="2.0">
<form id="mainmenu">
  <field name="menuchoice">
    <grammar type="application/srgs+xml" root="ROOT" mode="dtmf">
      <rule id="ROOT">
        <one-of>
          <item>1</item>
          <item>2</item>
          <item>3</item>
        </one-of>
      </rule>
    </grammar>
      <prompt>
        For sales, press 1.
        For tech support, press 2.
        For company directory, press 3.
      </prompt>
      <filled>
        <if cond="menuchoice==1">
          Welcome to sales.
        <elseif cond="menuchoice==2"/>
          Welcome to tech support.
        <elseif cond="menuchoice==3"/>
          Welcome to the company directory.
        </if>
      </filled>
  </field>
</form>
</vxml>

From this example, notice that the SRGS+XML grammar in this example is longer than the JSGF grammar in the example before it. For numeric input, JSGF is often a shorter alternative.

User Interaction with Speech

Up to this point, we've restricted our discussion to the use of touch tone (DTMF) input. One of the most compelling reasons to use VoiceXML is the ability to integrate advanced speech recognition technologies simply and portably. Let's use speech instead of DTMF for the JSGF example in Section 1.2.

<?xml version="1.0"?>
<vxml version="2.0">
<form id="mainmenu">
  <field name="menuchoice">
    <grammar type="application/x-jsgf" mode="voice">
      one|two|three
    </grammar>
      <prompt>
        For sales, say 1.
        For tech support, say 2.
        For company directory, say 3.
      </prompt>
      <filled>
        <if cond="menuchoice=='one'">
          Welcome to sales.
        <elseif cond="menuchoice=='two'"/>
          Welcome to tech support.
        <elseif cond="menuchoice=='three'"/>
          Welcome to the company directory.
        </if>
      </filled>
  </field>
</form>
</vxml>

From this example, notice that we set the “mode” attribute in the <grammar> tag to “voice” instead of “dtmf”. Also, note that we have to spell out the numbers “one”, “two”, and “three” for the speech grammar instead of using the Arabic numbers 1, 2, and 3 like we did for the DTMF example. We also have to do this inside of the <if> and <elseif> tags and place single quotes around them since they are strings.

Built-in Grammars

To simplify development there are several base grammars that are built into the system. They can be referenced by name in the “type” attribute of the “field” tag. An example of this would be:

You can use the boolean built-in grammar when expecting an affirmative phrase (such as “yes”) or a negative phrase (such as “no”) in your application. You can also use DTMF for this grammar, where DTMF-1 is affirmative and DTMF-2 is negative. The six other built-in grammars are date, digits, currency, number, phone, and time. Note that phone and time work only for Nuance OSR engines.

Below is an example of how we can use a built-in grammar inside of a <field> tag.

  
<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits">
      <prompt>
        Please say or enter your customer identification number.
      </prompt>
      <filled>
        You entered <value expr="id"/>.
        <!-- transfer to premium support -->
      </filled>
    </field>
  </form>
</vxml>

From this example, in the field “id”, we use the built-in grammar “digits” to allow the user to say or enter any amount of digits for the customer identification number. However, if we wanted to specify a certain amount of digits for the customer identification number, we could use the digits?length=n parameter to do this.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt>
        Enter your seven digit customer
        identification number.
      </prompt>
      <filled>
        You entered <value expr="id"/>.
        <!-- transfer to premium support -->
      </filled>
    </field>
  </form>
</vxml>

Here, the user has to enter 7 digits for their customer identification number due to the parameter. If the user does not enter exactly 7 digits, the system will respond with, “Sorry, I didn't understand you” and re-prompt “Enter your seven digit customer identification number” back to the user. We will find out more about error handling in the next section of the tutorial.

Standard Events

Plum DEV already takes care of trapping and handling some exception conditions such as when the user enters no input or the user enters input not defined in the grammar for an input field. In the next example, we want to collect a seven digit identification number in a field.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt>
        Enter your customer identification number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="id"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
    </field>
  </form>
</vxml>

If the user enters nothing for the timeout interval (default is 3 seconds), Plum DEV's built-in exception handling will play the default no input message: “Sorry, I didn't hear you” and then re-prompt the user to enter their customer identification number.

If the user enters input that does not match the defined grammar (in this case 7 digits), Plum DEV's built-in exception handling will play the default no match message: “Sorry, I didn't understand you” and re-prompt the user to enter their customer identification number.

The following example mimics the default behavior of the system. This example behaves identically to the previous example.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt>
        Enter your customer identification number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="id"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
      <noinput>
        <prompt>
          Sorry, I didn't hear you.
        </prompt>
        <reprompt/>
      </noinput>
      <nomatch>
        <prompt>
          Sorry, I didn't understand you.
        </prompt>
        <reprompt/>
      </nomatch>
    </field>
  </form>
</vxml>

By defining your own actions for no match and no input events you can greatly increase the control you have over your code. For instance, you could choose to offer a more helpful error message, to not play the original prompt again by omitting the <reprompt/> tag, to play custom messages for the specific occurrence of an exception event, to execute script code, or to abandon the effort altogether by moving on to a new form using <goto>.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt>
        Enter your customer identification number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="id"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
      <noinput count="1">
       <!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
        <prompt>
          Your identification number is the seven digit number on the front of your membership card.
        </prompt>
        <reprompt/>
      </noinput>
      <noinput count="3">
        <prompt>
          It seems you are having difficulty with your identification number, we will transfer you to customer service.
        </prompt>
        <!-- transfer caller to customer service -->
      </noinput>
      <nomatch count="1">
        <assign name="badid" expr="id"/>
        <prompt>
          Your identification number must be seven digits. Please try again.
        </prompt>
      </nomatch>
      <nomatch count="3">
        <prompt>
          It seems you are having difficulty with your identification number, we will transfer you to customer service.
        </prompt>
        <!-- transfer caller to customer service -->
      </nomatch>
    </field>
  </form>
</vxml>

In the above example, we are defining custom exceptions for the first two occurrences of the nomatch and noinput events, as well as separate exceptions for the third occurrence of each nomatch and noinput event.

It is good practice to include the “count” attribute when defining exception events to avoid infinite loops and to increase the customer experience, for instance by helpfully transferring the user to customer service when they are having difficulty entering input. Remember that count = 1 will execute for both 1 & 2 because, the FIA looks at all of the matching <catch> elements and finds the highest count value that is less than or equal to the current count.

If you want to specify different nomatch prompts for each invalid try you would set the count to specific consecutive numbers, this is shown in the code below:

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt count="1">
        Enter your customer identification number.
      </prompt>
      <prompt count="2">
        Enter your seven digit customer identification
        number.
      </prompt>
      <prompt count="3">
        Your customer identification number can be
        found on the front of your membership card.
        Enter your seven digit customer identification
        number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="customerid"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
      <catch event="nomatch noinput" count="1">
         <prompt>
          Your input was not valid.
        </prompt>
        <reprompt/>
      </catch>
      <catch event="nomatch noinput" count="2">
         <prompt>
          Your input was not valid please try one last time.
        </prompt>
       <reprompt/>
      </catch>
      <catch event="nomatch noinput" count="3">
        <prompt>
          It seems you are having difficulty with your identification number, we will transfer you to customer service.
        </prompt>
        <!-- transfer caller to customer service -->
      </catch>
    </field>
  </form>
</vxml>

Note that if your counter exceeds the maximum count for your defined exception events, the highest event will be throw. In the previous example, we would continue to hit the count=”3” event once we have exceeded three nomatch or noinput events.

The nomatch and noinput events are shorthand for the generic <catch> event handler. You may have noticed above that we defined the same event for the third occurrence of both the noinput and nomatch events. We could consolidate the above example to use the same actions for both nomatch and noinput events as such:

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt>
        Enter your customer identification number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="id"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
      <catch event="nomatch noinput" count="1">
       <!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
        <prompt>
          Your input was not valid. Your identification number is the seven digit number on the front of your membership card.
        </prompt>
        <reprompt/>
      </catch>
      <catch event="nomatch noinput" count="3">
        <prompt>
          It seems you are having difficulty with your identification number, we will transfer you to customer service.
        </prompt>
        <!-- transfer caller to customer service -->
      </catch>
    </field>
  </form>
</vxml>

Also, as we have just done with the <catch> tag, rather than simply repeating the same prompts to the user, we can offer increasingly detailed prompt messages by using the prompt “count” attribute.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
    <field name="id" type="digits?length=7">
      <prompt count="1">
        Enter your customer identification number.
      </prompt>
      <prompt count="2">
        Enter your seven digit customer identification
        number.
      </prompt>
      <prompt count="3">
        Your customer identification number can be
        found on the front of your membership card.
        Enter your seven digit customer identification
        number.
      </prompt>
      <filled>
        <assign name="customerid" expr="id"/>
        <prompt>
          You entered <value expr="customerid"/>.
        </prompt>
        <!-- transfer to premium support -->
      </filled>
      <catch event="nomatch noinput" count="1">
       <!-- this code executes for count 1 and 2, the FIA looks at all of the matching <catch> elements and finds the highest count value that is <= the current count-->
         <prompt>
          Your input was not valid.
        </prompt>
        <reprompt/>
      </catch>
      <catch event="nomatch noinput" count="3">
        <prompt>
          It seems you are having difficulty with your identification number, we will transfer you to customer service.
        </prompt>
        <!-- transfer caller to customer service -->
      </catch>
    </field>
  </form>
</vxml>

The user interaction might sound like this:

C: Enter your customer identification number. <prompt counter = 1> H: <enters 1 2 3> C: Your input was not valid. Enter your seven digit customer identification number. <prompt counter = 2> H: <enters 1 4 5> C: Your input was not valid. Your customer identification number can be found on the front of your membership card. Enter your seven digit customer identification number. <prompt counter = 3> H: <enters 1 2 3 4 5 6> C: It seems you are having difficulty with your identification number, we will transfer you to customer service. <transfer user to customer service>

See the Plum DEV Reference Manual for other standard VoiceXML events.

ECMAScript Input Validation

Plum DEV has a fully functioning JavaScript engine, similar to a standard web browser. This allows us to define functions and make use of all of the features that the JavaScript language has to offer.

Now, suppose we have a field that checks the length of a customer identification number.

<?xml version="1.0"?>
<vxml version="2.0">
  <form>
  <field name="id" type="digits">
    <prompt>
      Please enter your customer identification number.
    </prompt>
      <filled>
        <if cond="id.length == 7">
           <assign name="customerid" expr="id"/>
           You entered <value expr="id"/>.
           <!-- transfer to premium support -->
         <else/>
           Invalid ID number. Please check the number
           and try again.
           <clear namelist="id"/>
           <reprompt/>
         </if>
      </filled>
  </field>
  </form>
</vxml>

In this example, “id” is an ECMAScript variable that is set when the caller enters a customer identification number. In the if conditional of the filled block, the contents of the “cond” attribute, “id.length == 7”, are evaluated as an ECMAScript expression. If the user entered a 7-digit number, the if conditional would be true and a new variable, “customerid”, would be assigned the value of “id”. Finally, the <value expr/> expression converts the contents of “id” into a string for playback and the application states what the user entered for their “id”. If the user did not enter 7 digits, the if conditional would be false and go to the else conditional. The application would say to the user, “Invalid ID number. Please check the number and try again.” The “id” variable would be cleared with the <clear> tag and the application would re-prompt the user for their customer identification number.

To navigate your way through forms and fields in your application, you can use the <goto> tag. The next example demonstrates how we can use the <goto> tag to navigate our way through an application:

<?xml version="1.0"?>
<vxml version="2.0">
    <form id="firstform">
        <block>
            <prompt>
                Going to the next form.
            </prompt>
            <!-- A "#" symbol followed by an identifier specifies a -->
            <!-- form or menu ID to jump to. -->
            <goto next="#nextform"/>
        </block>
    </form>
    <form id="nextform">
        <block>
            <prompt>
                Welcome to the next form. Goodbye.
            </prompt>
        </block>
    </form>
</vxml>

Here, the “next” attribute in the <goto> tag brings you to the form “nextform”. If your application had multiple forms, you could use the “next” attribute in the <goto> tag to bring you to any one of these forms, as long as you specify the “id” for that form.

The <menu> tag is a convenient way for you to create a single anonymous field in your application that prompts the user to make a choice and then transitions to a different place in your application based on the user's choice. Let's look at an example that navigates through the application using the <menu> and <choice> tag.

<?xml version="1.0"?>
<vxml version="2.0">

  <form>
    <block>
      <prompt>
         Welcome to Plum Voice.
      </prompt>
      <goto next="#mainmenu"/>
    </block>
  </form>

  <menu id="mainmenu">
    <prompt>
      For sales, press 1 or say sales.
      For tech support, press 2 or say support.
      For company directory, press 3 or say directory.
    </prompt>
    <choice dtmf="1" next="#sales">
       Sales</choice>
    <choice dtmf="2" next="#support">
       Tech Support</choice>
    <choice dtmf="3" next="#directory">
       Company Directory</choice>
  </menu>

  <form id="sales">
    <block>
      Please hold for the next available sales
      representative.
      <!-- transfer to sales -->
    </block>
  </form>

  <form id="support">
    <block>
      <!-- transfer to tech support -->
    </block>
  </form>

  <form id="directory">
    <block>
      <!-- transfer to company directory -->
    </block>
  </form>

</vxml>

From this example, note that in the first <form> block, the “next” attribute of the <goto> tag points to “mainmenu”, which is specified as the “id” of the <menu> tag. Inside this <menu> tag, there are 3 <choice> tags that bring the user to their specified choice based on their input of 1, 2, or 3. Notice that these <choice> tags also use the “next” attribute to point to an “id” of a <form>. So, if the user enters DTMF-1, the application will go to the <form> block with an “id” of “sales”. If the user enters DTMF-2, the application will go to the <form> block with an “id” of “support”. If the user enters DTMF-3, the application will go to the <form> block with an “id” of “directory”.

Not only can you use the <goto> tag to navigate within your document, you can also use it to navigate through multiple VoiceXML documents.

firstdocument.vxml:

<?xml version="1.0"?>
<vxml version="2.0">
    <form>
        <block>
            <prompt>
              Hello World!
            </prompt>
            <goto next="nextdocument.vxml"/>
        </block>
    </form>
</vxml>
nextdocument.vxml:

<?xml version="1.0"?>
<vxml version="2.0">
    <form>
        <block>
            <prompt>
              Goodbye World!
            </prompt>
        </block>
    </form>
</vxml>

From this example, the application first greets the user with “Hello World!” and then uses the <goto> tag to transition to the VoiceXML document, “nextdocument.vxml”. In “nextdocument.vxml”, the application says “Goodbye World!” and then the application ends. Keep in mind that when you use the <goto> tag to transition to another VoiceXML document, data that was collected in your original document is lost when the new document is executed.

Sending Data

To exchange data between the VoiceXML platform and an application server, you can use the <submit>, <subdialog>, or <data> tag. The interaction between the platform and application server is a series of HTTP GETs or HTTP POSTs where the application server processes these requests and returns valid VoiceXML.

When using the <submit> tag, it completes a GET or POST that will trigger a page transition. Once the GET or POST is complete, the new document will be parsed and executed and the previous VoiceXML document will be discarded. For example, you can collect information through document level variables and send those variables through the <submit> tag to your application server script.

collectinfo.vxml:

<?xml version="1.0"?>
<vxml version="2.0">

<form>
  <field name="customerid" type="digits">
      <prompt>
        Please enter your customer identification number using your keypad.
      </prompt>
  </field>

  <field name="age" type="digits?minlength=1;maxlength=2">
      <prompt>
        Please enter your age using your keypad.
      </prompt>
  </field>

  <block>
    <prompt>
      Please wait while we process your information.
    </prompt>
    <submit namelist="customerid age" next="http://mightyserver.com/submit.php"/>
  </block>
</form>

</vxml>
submit.php:

<?php
header("Content-type: text/xml");
echo("<?xml version=\"1.0\"?>\n");

$customerid = $_GET[customerid];
$age = $_GET[age];
?>

<vxml version="2.0">
  <form>
    <block>
      <prompt>
        Your customer identification number is <?php echo($customerid)?>.
      </prompt>
      <prompt>
        Your age is <?php echo($age)?>.
      </prompt>
    </block>
  </form>
</vxml>

From this example, we gather information from the user through the variables “customerid” and “age”. We then send these variables to “submit.php” by using the <submit> tag. This then causes a page transition from “collectinfo.vxml” to “submit.php”. In “submit.php”, we use $_GET to assign the variables, $customerid and $age, the values of “customerid” and “age” from “collectinfo.vxml”. From here, the application states the customer identification number and age back to the user.

Similarly, you can also use the <subdialog> tag to exchange data between Plum DEV and application server. The <subdialog> tag is similar to a function call executed by a GET or POST. It allows you to execute a new VoiceXML document in a new context, but once the <subdialog> is complete, control will be returned to the parent document at the same location that the subdialog was called.

subdialog.vxml:

<?xml version="1.0"?>
<vxml version="2.1">
    <form>
        <subdialog name="info" src="http://mightyserver.com/collectinfo.vxml"/>
        <block>
             <prompt>
               Your customer identification number is <value expr="info.customerid"/>.
             </prompt>
             <prompt>
               Your age is <value expr="info.age"/>.
             </prompt>
        </block>
    </form>
</vxml>
collectinfo.vxml:

<?xml version="1.0"?>
<vxml version="2.0">

<form>
  <field name="customerid" type="digits">
      <prompt>
        Please enter your customer identification number using your keypad.
      </prompt>
  </field>

  <field name="age" type="digits?minlength=1;maxlength=2">
      <prompt>
        Please enter your age using your keypad.
      </prompt>
  </field>

  <block>
    <prompt>
      Please wait while we process your information.
    </prompt>
    <return namelist="customerid age"/>
  </block>
</form>

</vxml>

From this example, notice how we use the <subdialog> tag to transition to “collectinfo.vxml” Once we're in “collectinfo.vxml”, we collect the “customerid” and “age” from the user and return these field variables back to “subdialog.vxml” by using the <return> tag. Once we're back in “subdialog.vxml”, we can refer to “customerid” and “age” by simply adding a “.” and then the name of the variable following “info”. So, the customer identification number would be referred as “info.customerid” and age would be referred as “info.age” in the original document.

For the <data> tag, it differs from the <subdialog> tag in that it does not execute a remote VoiceXML document, but instead expects the remote application to return an XML result. This XML file is then mapped directly into an ECMAScript DOM object for Plum DEV to reference as a variable.

collectinfo.vxml:

<?xml version="1.0"?>
<vxml version="2.0">

<form>
  <field name="agentid" type="digits">
      <prompt>
        Please enter your agent number using your keypad.
      </prompt>
  </field>

  <field name="age" type="digits?minlength=1;maxlength=2">
      <prompt>
        Please enter your age using your keypad.
      </prompt>
  </field>

  <block>
    <data name="verification" namelist="agentid age"
    src="http://mightyserver.com/verification.xml"/>
      <prompt>
        Welcome, <value expr="verification.documentElement.firstChild.toString()"/>.
      </prompt>
      <prompt>
        The agent number you entered is <value expr="agentid"/>.
      </prompt>
      <prompt>
        The age you entered is <value expr="age"/>.
      </prompt>
  </block>
</form>

</vxml>
verification.xml:

<?xml version="1.0"?>
<name>Mister Bond</name>

From this example, the <data> tag references to “verification.xml”, which returns valid XML back to Plum DEV. (A small note on XML: You can create and name a tag anything you want, as long as the tag is closed off properly within the XML document.) To reference this XML in “collectinfo.vxml”, you would have to add “.documentElement.firstChild.toString()” to the end of the name that you specified in the <data> tag. So, in this case, to reference “Mister Bond” from “verification.xml”, you would refer to it as “verification.documentElement.firstChild.toString()” in “collectinfo.vxml”.

Recording User Input

To record audio from the user, you would use the <record> tag. The <record> tag is an input item that collects a recording from the user. A reference to the recorded audio is stored in the input item variable, which can be played back (using the expr attribute for the <value> tag) or submitted to a server (using the <submit> tag).

The following is a short example that demonstrates how to use the <record> tag:

record.vxml:

<?xml version="1.0"?>
<vxml version="2.0">
    <form>
        <record name="myrecording" type="audio/x-wav" beep="true">
            <prompt>
                Please record a message after the beep.
            </prompt>

            <filled>
                You just recorded the following message:
                <value expr="myrecording"/>
                <submit next="submitrecording.php" namelist="myrecording"
                method="post" enctype="multipart/form-data"/>
            </filled>
        </record>
    </form>
</vxml>
submitrecording.php:

<?php
header("Content-type: text/xml");
echo("<?xml version=\"1.0\"?>\n");
?>

<vxml version="2.0">
  <form>
    <block>

<?php
if (isset($_FILES['myrecording']) && is_uploaded_file($_FILES['myrecording']['tmp_name'])) {
        move_uploaded_file($_FILES['myrecording']['tmp_name'],"message.wav");
        echo "<prompt bargein=\"false\">Audio saved.</prompt>";
}  else {
        echo "<prompt bargein=\"false\">Audio not saved.</prompt>";
}
?>

    </block>
  </form>
</vxml>

From this example, the user is prompted to record a message after the beep. Notice that the attribute, “beep”, is set to true so that the user hears the beep. Once the user finishes recording, the application repeats the recording back to the user and then submits the recording, “myrecording”, to submitrecording.php through the “namelist” attribute of the <submit> tag. In submitrecording.php, the file gets uploaded and stored with the name “message.wav”. If this is successful, the user hears, “Audio saved.” If it was not successful, the user hears, “Audio not saved.” Note that the directory in which the audio recording file is being saved must have the appropriate permissions set to allow the creation of this new audio file.

There are also numerous attribues for the <record> tag that can help you adjust recording settings for the user. For example, to adjust the maximum duration of time for the user, you would use the attribute, “maxtime”. This maximum duration has a limit of 1 hour. To adjust the amount of time the user has before the recording stops, you would use the attribute, “finalsilence”. The maximum “finalsilence” time is 5 minutes. If you want the user to be able to terminate the recording by pressing any DTMF key, you could use the attribute, “dtmfterm”.

The following example demonstrates how to use these attributes in your application:

recordattributes.vxml:

<?xml version="1.0"?>
<vxml version="2.0">
    <form>
        <record name="myrecording" maxtime="300s"
        finalsilence="30s" dtmfterm="true">
            <prompt>
                Please say any comments you might have about the class.
                Press any DTMF key when you are finished recording.
            </prompt>

            <filled>
                You just recorded the following:
                <value expr="myrecording"/>
            </filled>
        </record>
    </form>
</vxml>

From this example, the user has a maximum time of 5 minutes to record a message. If the user stops speaking for a moment during the recording, “finalsilence” gives the user 30 seconds to say something to continue the recording; otherwise, the recording is terminated. Finally, by setting “dtmfterm” to true in this example, the user can press any DTMF key to end the recording.

You can also set the audio file type of the recording by using the “type” attribute. The audio format can be set to 1 of 3 options: audio/basic (which is 8 kHz 8-bit u-law encoded headerless (*.ul)), audio/x-alaw-basic (which is 8 kHz 8-bit a-law encoded headerless (*.al)), or audio/x-wav (which is 8 kHz 8-bit u-law WAV (*.wav)). If you wanted a different audio format from these 3 options, you could use an audio conversion tool such as SOX or Adobe Audition to reformat your files afterwards.

Finally, keep in mind that when you use the <record> tag, the recording is not stored on our servers. The recording can only be played back by means of the “expr” attribute of the <value> tag or submitted to your server to be stored by means of the <submit> tag.

Tuning Application Behavior

To tune the behavior of the application, you can use the <property> tag. The <property> element sets a property value. Properties are used to set values that affect platform behavior, such as the recognition process, timeouts, caching policy, etc.

Properties may be defined for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Properties apply to their parent element and all the descendants of the parent. A property at a lower level overrides a property at a higher level. When different values for a property are specified at the same level, the last one in document order applies.

One property we can use to tune our application is the “inputmodes” property. This property can be set to understand just speech input from the user or just dtmf input from the user. To set “inputmodes” to understand just speech input, set the “value” attribute of the property to “voice”. To set “inputmodes” to understand just dtmf input, set the “value” attribute of the property to “dtmf”. Please note that when using the “inputmodes” property, you will need to be mindful of using various “inputmodes” properties within your code, since the platform's behavior queues all properties at once up to the first non-bargeable prompt and/or first input element.

The following example demonstrates how we would use “inputmodes” to just understand dtmf input from the user:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="inputmodes" value="dtmf"/>
<property name="interdigittimeout" value="3s"/>

    <form>
        <field name="myfield" type="digits">
            <prompt>
                You will only be able to enter digits.
                Enter a number on your keypad.
            </prompt>
            <filled>
                You entered <value expr="myfield"/>.
            </filled>
            <nomatch>
                You did not enter a number properly.
                <reprompt/>
            </nomatch>
            <noinput>
                You did not enter anything.
                <reprompt/>
            </noinput>
        </field>
    </form>

</vxml>

A possible user interaction might be:

C: You will only be able to enter digits. H: Twelve. C: (ignores spoken input) Enter a number on your keypad. H: (enters DTMF-1 DTMF-2) C: You entered twelve.

To tune properties for recognizing incoming speech, we can use the “confidencelevel” property. This property adjusts the confidence needed for a recognition. For example:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="confidencelevel" value="0.75"/>

<form>
  <field type="boolean">
    <prompt>
      Please say yes or no.
    </prompt>
  </field>
</form>

</vxml>

From this example, the threshold of the confidence level is raised to 0.75, requiring a clear response of a “yes” or “no” answer. Using a high confidence level setting is useful for when you are expecting a precise match to your grammar.

However, for grammars with multiple possibilities for matches such as a database of first and last names, you would want to adjust the confidence level to allow the system to more broadly match what the user is saying.

To tune properties for prompting and collecting, the “bargein” property can be used to prevent users from interrupting speech. Here is an example that does not allow the user to interrupt for the first prompt, but does allow the user to interrupt for the second prompt:

<?xml version="1.0"?>
<vxml version="2.0">

    <form>
        <property name="bargein" value="false"/>
        <field name="myfield">
            <grammar type="application/x-jsgf" mode="voice">
                ( one | two )+
            </grammar>
            <prompt>
                You must listen to this message.
            </prompt>
            <prompt bargein="true">
                Say any number of the digits one or two.
            </prompt>
            <filled>
                You said <value expr="myfield"/>.
            </filled>
            <nomatch>
                You did not say any ones or twos.
                <reprompt/>
            </nomatch>
            <noinput>
                You did not say anything.
                <reprompt/>
            </noinput>
        </field>
    </form>

</vxml>

So, a possible user interaction might be:

C: You must… H: One two. C: …listen to this message. C: Say any number of… H: One two. C: You said one two.

Since the original value of “bargein” is set to false, the user is not allowed to interrupt the first message. When “bargein” is set to true, the user is allowed to interrupt the message by saying 1 or 2.

Another property that can be used to adjust prompting and collecting is the “timeout” property. This value can be adjusted to allow for more time for a speech or dtmf input from the user. For example:

<?xml version="1.0"?>
<vxml version="2.0">

    <form>
        <property name="timeout" value="7s"/>
        <field name="myfield">
            <grammar type="application/x-jsgf" mode="voice">
                ( one | two )+
            </grammar>
            <grammar type="application/x-jsgf" mode="dtmf">
                ( 1 | 2 )+
            </grammar>
            <prompt>
                Say or enter any number of the digits one or two.
            </prompt>
            <filled>
                You entered <value expr="myfield"/>.
            </filled>
            <nomatch>
                You did not enter any ones or twos.
                <reprompt/>
            </nomatch>
            <noinput>
                You did not enter anything.
                <reprompt/>
            </noinput>
        </field>
    </form>

</vxml>

From this example, the user has 7 seconds to say or enter a string of ones or twos. If the user does not say or enter anything after 7 seconds, a noinput event is generated. Setting “timeout” as a global property in your application will adjust the timeout for all prompts in the document. A “timeout” attribute can also be set per prompt. This allows you to allow for a longer timeout if you were collecting information the user might need to find such as a customer ID on a bill or a short timeout if you were asking them a yes/no question.

To tune properties for recognizing DTMF, we can use the “interdigittimeout” property to adjust the in-between time for the user to input numbers on a telephone keypad:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="interdigittimeout" value="3s"/>

    <form>
        <field name="myfield">
            <grammar type="application/x-jsgf" mode="dtmf">
                ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0)+
            </grammar>
            <prompt>
                Please enter your credit card number.
            </prompt>
            <filled>
                You entered <value expr="myfield"/>.
            </filled>
            <noinput>
                You did not enter anything.
                <reprompt/>
            </noinput>
        </field>
    </form>

</vxml>

From this example, the user has 3 seconds between inputting digits on the keypad once the first digit is entered. If nothing is entered, a “timeout” occurs, resulting in a <noinput> being generated. This makes the “interdigittimeout” property handy for applications that collect a long input of digits, since the user would likely pause to check the number that they are entering.

The <property> tag also allows us to tune caching properties for audio, documents, grammars, and scripts. To adjust these properties for audio, you would use “audiofetchhint”, “audiomaxage”, and “audiomaxstale”. For document reference tags such as <subdialog>, <goto>, <submit>, <link>, and <choice>, you would use “documentfetchhint”, “documentmaxage”, and “documentmaxstale”. For grammars, you would use “grammarfetchhint”, “grammarmaxage”, and “grammarmaxstale”. For scripts, you would use “scriptfetchhint”, “scriptmaxage”, and “scriptmaxstale”. For example:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="documentmaxage" value="150s"/>
<property name="documentmaxstale" value="25s"/>
    <form>
         <block>
              <goto next="myfile.vxml"/>
         </block>
    </form>

</vxml>

From this example, the “documentmaxage” value is set to 150 seconds and the “documentmaxstale” value is set to 25 seconds. This sets a global property that all document tags (<goto>, <submit>, etc.) have a maxage value of 150 seconds and a maxstale value of 25 seconds. So, since the file “myfile.vxml” is inside of a <goto> tag, it would have a maxage value of 150 seconds and a maxstale value of 25 seconds because of the “documentmaxage” and “documentmaxstale” properties. The maxage value allows you to override the expiration time for the local copy of a file. The maxstale value allows you to extend the “life” of a cached file by a certain amount of seconds, meaning it will send the local cached file back to Plum DEV even after it has already expired.

Also, the fetchtimeout property can be used to set the timeout for fetching a file from a web server. For example:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="documentmaxage" value="150s"/>
<property name="documentmaxstale" value="25s"/>
<property name="fetchtimeout" value="20s"/>
    <form>
         <block>
              <goto next="myfile.vxml"/>
         </block>
    </form>

</vxml>

From this example, if the file “myfile.vxml” cannot be fetched within 20 seconds from the web server, then a timeout occurs and an error is thrown.

Another way to tune fetching properties is to use “fetchaudio”, “fetchaudiodelay”, and “fetchaudiominimum”. These properties can be used to control the audio that is played for a user when the user is put on hold while a document is being fetched. For example:

fetchaudioscript.php:

<?xml version="1.0"?>
<vxml version="2.0">

<property name="fetchaudio" value="holdmusic.wav"/>
<property name="fetchaudiodelay" value="2s"/>
<property name="fetchaudiominimum" value="5s"/>
    <form>
         <block>
              <goto next="delayexamplescript.php"/>
         </block>
    </form>

</vxml>
delayexamplescript.php:

<?php
header("Content-type: text/xml");
echo "<?xml version=\"1.0\"?>\n";

sleep(10);
?>

<vxml version="2.0">
  <form>
    <block>
      <prompt> delayed script </prompt>
    </block>
  </form>
</vxml>

From this example, the “fetchaudio” property sets holdmusic.wav to play whenever there is a delay in fetching a file. The “fetchaudiodelay” property inserts a 2 second pause before the “fetchaudio” source is played. If the document is fetched prior to “fetchaudiodelay” expiring, then no “fetchaudio” source will be played. The “fetchaudiominimum” property causes a 5 second minimum time interval to play the “fetchaudio” source, even after the fetch has arrived. Just for the purposes of this example, the delay php script is set to sleep for 10 seconds for the fetchaudio elements to play.

For more information on using properties, you can go to the Properties section of the Plum DEV Reference Manual.

Auto Attendant Example

From this tutorial, you should now be able to build your own application. Let's try to build an automated attendant application. During this example, we'll build up the autoattendant application piece by piece. Also, please note that in the code snippets, all phone numbers and extensions mentioned are not real and all .wav files are nonexistent. First, begin by starting with Plum's standard template.

<?xml version="1.0"?>
<vxml version="2.0">

</vxml>

Next, set up a <form> block to be your introduction to the user. Let's use the <audio> tag to use pre-recorded audio for the announcement instead of TTS. However, let's make the prompt such that the user cannot interrupt the introduction by using the “bargein” property. Also, set up a <goto> tag to go to a <menu> block for the next section of the application.

<form id="intro">
     <block>
          <prompt bargein="false">
            <audio src="wavfiles/humanvoice.wav">
               Hello! Welcome to The Electronic Store, the leader of all electronic stores!
            </audio>
          </prompt>
          <goto next="#mainmenu"/>
     </block>
</form>

Next, set up a <menu> block using the <choice> tag and allow the user to make a choice by either DTMF or speech input.

<menu id="mainmenu">
     <prompt>
          Please choose a department:
          <enumerate/>
     </prompt>
     <choice dtmf="1" next="#sales">
          Sales</choice>
     <choice dtmf="2" next="#support">
          Support</choice>
     <choice dtmf="3" next="#directory">
          Company Directory</choice>
</menu>

Next, set up a <form> block for the first choice made by the user from your menu.

<form id="sales">
     <block>
          Please hold for the next available sales representative.
     </block>
     <!-- transfer to sales -->
     <transfer dest="+11234567890" connecttimeout="20s" bridge="true"/>
</form>

Next, set up a <form> block for the second choice in your menu for the user. Here, use an if conditional to give the user premium support if they know their customer identification number. If they don't know their customer identification number, set up another form block that will collect the user's telephone number and submit it to another application script to be stored in a database. You should use <noinput> and <nomatch> tags to help the user if they are having problems entering their telephone number.

<form id="support">
  <field name="hasid" type="boolean">
    <prompt>
      Welcome to customer support. If you know your customer
      identification number, press 1 or say yes.  Otherwise, press 2 or say no.
    </prompt>
    <filled>
      <if cond="hasid==false">
        <goto next="#unknowncustomer"/>
      </if>
    </filled>
  </field>
  <field name="id" type="digits">
    <prompt>
      Enter your customer identification number.
    </prompt>
    <filled>
      <assign name="customerid" expr="id"/>
      You entered <value expr="id"/>.
      Transferring to premium support.
    </filled>
  </field>
  <!-- transfer to premium support -->
  <transfer dest="+15554443333" connecttimeout="20s" bridge="true"/>
</form>

<form id="unknowncustomer">
  <field name="telnumber" type="digits?length=10">
    <prompt>
      Enter your telephone number so we can get back to you in case of a dropped call.
    </prompt>
    <filled>
      <!-- submit phone number to database for telemarketing -->
      <submit namelist="telnumber" next="http://mightyserver.com/save_number.php"/>
    </filled>
    <noinput>
      Sorry, I didn't hear you. Please use your keypad to enter your telephone number.
      <reprompt/>
    </noinput>
    <nomatch>
      Sorry, I didn't understand. Please enter the area code first, then the number.
      <reprompt/>
    </nomatch>
  </field>
</form>

Next, set up a <form> block for the third choice in your menu for the user. Here, use the grammar tag to set up 5 names in the company directory that the user can say to be transferred to that person's extension number. To do this, you would have to add curly braces (“{}”) with the extension number of the person inside of it. This does a direct replacement, where if the user says a name, the grammar understands it as an extension number. You should also add <noinput> and <nomatch> error handlers to help the user if they are having problems with entering a name.

<form id="directory">
     <field name="personextension">
          <grammar type="application/x-jsgf" mode="voice">
               (James{7890} | Bob{5092} | Peter{6550} | Mike{8338} | Donatello{5138})
          </grammar>
          <prompt>
               Please say one of the following names: James, Bob, Peter, Mike, Donatello.
          </prompt>
          <filled>
               Transferring to <value expr="personextension"/>.
          </filled>
          <noinput>
               You did not say anything. Try speaking louder or hold the phone closer to
               your mouth.
               <reprompt/>
          </noinput>
          <nomatch>
               Sorry, I can't understand you. Try speaking a little slower and more clearly.
               <reprompt/>
          </nomatch>
     </field>
     <!-- transfer to person's extension number -->
     <transfer destexpr="personextension" connecttimeout="20s" bridge="true"/>
</form>

In the end, your application should look like something similar to this:

<?xml version="1.0"?>
<vxml version="2.0">

<form id="intro">
     <block>
          <prompt bargein="false">
            <audio src="wavfiles/humanvoice.wav">
               Hello! Welcome to The Electronic Store, the leader of all electronic stores!
            </audio>
          </prompt>
          <goto next="#mainmenu"/>
     </block>
</form>

<menu id="mainmenu">
     <prompt>
          Please choose a department:
          <enumerate/>
     </prompt>
     <choice dtmf="1" next="#sales">
          Sales</choice>
     <choice dtmf="2" next="#support">
          Support</choice>
     <choice dtmf="3" next="#directory">
          Company Directory</choice>
</menu>

<form id="sales">
     <block>
          Please hold for the next available sales representative.
     </block>
     <!-- transfer to sales -->
     <transfer dest="+1234567890" connecttimeout="20s" bridge="true"/>
</form>

<form id="support">
  <field name="hasid" type="boolean">
    <prompt>
      Welcome to customer support. If you know your customer
      identification number, press 1 or say yes.  Otherwise, press 2 or say no.
    </prompt>
    <filled>
      <if cond="hasid==false">
        <goto next="#unknowncustomer"/>
      </if>
    </filled>
  </field>
  <field name="id" type="digits">
    <prompt>
      Enter your customer identification number.
    </prompt>
    <filled>
      <assign name="customerid" expr="id"/>
      You entered <value expr="id"/>.
      Transferring to premium support.
    </filled>
  </field>
  <!-- transfer to premium support -->
  <transfer dest="555-444-3333" connecttimeout="20s" bridge="true"/>
</form>

<form id="unknowncustomer">
  <field name="telnumber" type="digits?length=10">
    <prompt>
      Enter your telephone number so we can get back to you in case of a dropped call.
    </prompt>
    <filled>
      <!-- submit phone number to database for telemarketing -->
      <submit namelist="telnumber" next="http://mightyserver.com/save_number.php"/>
    </filled>
    <noinput>
      Sorry, I didn't hear you. Please use your keypad to enter your telephone number.
      <reprompt/>
    </noinput>
    <nomatch>
      Sorry, I didn't understand. Please enter the area code first, then the number.
      <reprompt/>
    </nomatch>
  </field>
</form>

<form id="directory">
     <field name="personextension">
          <grammar type="application/x-jsgf" mode="voice">
               (James{7890} | Bob{5092} | Peter{6550} | Mike{8338} | Donatello{5138})
          </grammar>
          <prompt>
               Please say one of the following names: James, Bob, Peter, Mike, Donatello.
          </prompt>
          <filled>
               Transferring to <value expr="personextension"/>.
          </filled>
          <noinput>
               You did not say anything. Try speaking louder or hold the phone closer to
               your mouth.
               <reprompt/>
          </noinput>
          <nomatch>
               Sorry, I can't understand you. Try speaking a little slower and more clearly.
               <reprompt/>
          </nomatch>
     </field>
<!-- transfer to person's extension number -->
<transfer destexpr="personextension" connecttimeout="20s" bridge="true"/>
</form>
</vxml>
save_number.php:

<?php
// it is left up to the reader to implement this function, "db_store_callback_number"
db_store_callback_number($_GET['telnumber']);
echo("<?xml version=\"1.0\"?>\n");
?>
<vxml>
<form>
     <block>
          <prompt>
               You entered <value expr="telnumber"/>.
               We will now transfer you to someone to get a customer identification number.
          </prompt>
     </block>
     <!-- transfer to customer service representative -->
     <transfer dest="+19801234567" connecttimeout="20s" bridge="true"/>
</form>
</vxml>

Once you complete this application, you have mastered many of the tags and techniques that are used within Plum DEV.

Last updated