Using Plum’s Transcription API

Detailed examples explaining how to use the transcription API across the Plum's product suite

From Digital Audio to Digital Text

Digital audio is great because it’s easy to work with. There are literally millions of ways that companies use and leverage digital audio files. Whether it’s in promotional materials, ads, customer service applications, or anything else, the same file can have multiple uses.

Digital audio is also useful for collecting information because it gives individuals a means to quickly convey large amounts of information without the need to think about the same structural conventions that frame the written word. More succinctly: it takes a lot long to write something out rather than just say it.

However, if companies want to use the content of recorded audio it can be challenging to extract that information. After all, no one wants to manually transcribe thousands of customer feedback comments, for instance.

Enter natural language processing and transcription. Using machine learning and natural language processing engines it’s possible to transform digital audio files into digital text. This digital text can then be easily manipulated, sorted, stored, organized, or analyzed in any other way a company may require.

Plum’s voice platform contains APIs that enable customers to quickly generate transcriptions of digital audio files, like those captured and recorded during phone calls.

Below we detail how to access transcription capabilities across our suite of voice products.

How to Call Plum’s Transcription API


The following image shows how to construct a simple transcription application in Plum Fuse.

After the welcome prompt is a ‘record’ module. As you can see in the module text, the module prompts the user to leave a voice message. The module records the caller audio and saves it as a variable with the same name as the module, in this case record_message.

Next, the call-flow hits a REST module called transcription_api. The address for Plum’s transcription API goes in the first text field.

Enter your access credentials in the Header section. These are the same credentials you use to access Plum DEV (contact your account manager if you don’t already have a DEV login).

The format in the Header field must be: Authorization: Basic XXXXX

Replace the Xs with your login credentials converted to base 64. For example, format your credentials as ‘username:password’, e.g., FuseUser:12345 becomes RnVzZVVzZXI6MTIzNDU=.

The text in the Header section would then be: Authorization: Basic RnVzZVVzZXI6MTIzNDU=

Next, ensure that the Request type is a POST and that the return type is the same type as that specified in the URL. In this case, the resource type is JSON (the API also works with XML).

Finally, specify the variables. The API requires two variables: 1.) language and 2.) audio.

Language simply tells the transcription engine what language to use on the audio. So, if the audio is in English, choose the appropriate English setting. A list of compatible languages and dialects is available in the Transcription API documentation.

The audio variable consists of the recording made in the record_message­ module. Click the plus button to add a variable and begin typing “record_message” and the variable should auto-populate.

The final prompt reads back the transcribed audio as text-to-speech. Again, the transcription gets saved as the name of the module that performed the action. In this case, that is transcription_api.result.message. Use that as the name of the variable in your playback prompt.

You can also choose to send the transcribed text (i.e., the variable transcription_api.result.message) to a database or any other necessary repository for further use/analysis.


Creating the same application in Plum DEV looks a little bit different. The following code sample produces the same results as the Fuse call-flow above.

There are two components in DEV: the VoiceXML script and the PHP script.

The VXML script contains the text that the caller will hear. The PHP script handles all the backend processes and API calls. The PHP initiates when the VXML script gets to the <subdialog> tag.

The first prompt (lines 5-7) mirrors the prompt module in the Fuse app. In the following section (starting on line 12), you can see the counterpart to the record module. Here the name of the audio file is set to myrecording.

<?xml version="1.0"?>
<vxml version="2.1">
      The following is an example of prompting a caller to leave a recorded message, then sending that message to our real-time-transcription API, and finally have the system read back the transcription as TTS.
   <goto next="#record"/>

  <form id="record">
    <var name="callrecording"/>
    <record name="myrecording" beep="true">
        Please leave a recorded message after the beep. When you are done recording, press the pound key.
	<assign name="callrecording" expr="myrecording"/>
   <subdialog name="transcription" src="subdialog.php" namelist="callrecording" method="post" enctype="multipart/form-data"/>
	  We transcribed your message as, <value expr="transcription.message"/>.

The API call starts at the <subdialog> tag. Looking at the subdialog.php script, you can see that both the language and audio form data parameters get set in the $post = array section (starting on Line 10).

header("Content-type: text/xml");
echo("<?xml version=\"1.0\"?>\n");

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, '');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
$post = array(
        'language' => 'en-US',
        'audio' => new CURLFILE($_FILES['callrecording']['tmp_name'])
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: Basic '.base64_encode('username:password')));

$result = curl_exec($ch);
if (curl_errno($ch)) {
        echo 'Error:' . curl_error($ch);
$data = json_decode($result);
$message = $data->result->message;

<vxml version="2.0">
      <var name="message" expr="'<?php echo("$message")?>'"/>
      <return namelist="message"/>

Using this method, you can encode your DEV credentials in base 64, just like the Fuse example above, and use them in the API call.

Simply replace the 'username:password' value in the code in line 15 with your base 64 string.

Using the same info from the Fuse example, the updated code in line 15 would look like this:

curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: Basic RnVzZVVzZXI6MTIzNDU='));

The $result section (starting on line 17) checks to make sure everything occurred as intended. If everything checks out, it provides the variable names for the VXML script, otherwise it logs an error.

Looking back at the VXML script (line 23), you can see that the app reads back the text of the transcription, indicated here by the variable transcription.message.


In Plum Insight transcription functionality is built directly into the Comment question type.

Simply add a Comment question (the microphone icon on the left). Then, click on the gear icon for the question itself and select the ‘Yes’ value for Transcribe Recordings.

To view the digital text for the call transcriptions go to the Reports section of Insight. Locate the desired survey from the list and click the ‘View Results’ button. On the ‘view results’ page, click the ‘view survey visits’ button.

You should see something similar to the image below for any questions with transcription enabled.

Each survey generates a Reporting API URL. This is available on the Survey Visits page. Use this API (detailed here) to send survey data to an external repository. (Note: Users will need to parse the survey data on their end.)

For further information about how to configure voice applications, or for detailed information about Plum’s transcription API, see the documentation for each respective product (DEV, Fuse, Insight).

Last updated