NAV Navbar
Python

Dialog as a Service gRPC API

Dialog as a Service is Nuance's omni-channel conversation engine. The Dialog as a Service API allows conversational AI applications to interact with dialogs created with the Mix.dialog and Mix.nlu web tools.

The gRPC protocol provided by Dialog as a Service allows a client application to interact with a dialog in all the programming languages supported by gRPC.

gRPC is an open source RPC (remote procedure call) software used to create services. It uses HTTP/2 for transport and protocol buffers to define the structure of the application. Dialog as a Service supports the gRPC proto3 version.

Version: v1

This release supports version v1 of the Dialog as a Service protocol. See gRPC setup to download the proto files and get started.

What's new in v1?

Version v1 includes the following changes:

For the changes between v1beta1 and v1beta2, see What's new in v1beta2?.

Upgrading to v1

To upgrade your client app to v1 from v1beta2 protocol, you need to regenerate your programming-language stub files from the new proto files, then make adjustments to your client application.

In particular:

  1. Download the v1 proto files here.
  2. Use gRPC tools to generate the client stubs from the proto files.
    See gRPC setup for details.
  3. Adjust your client application for the changes made to the protocol in v1, as described in What's new in v1?
  4. Re-build your dialog application, set up a new app config with the new build version, and deploy the new application configuration.

Dialog essentials

From an end-user's perspective, a dialog-enabled app is one that understands natural language, can respond in kind, and, where appropriate, can extend the conversation by following up the user's turn with appropriate questions and suggestions.

Dialogs are created using Mix.dialog; see Creating Mix.dialog Applications for more information. This document describes how to access a dialog at runtime from a client application using the DLGaaS gRPC API.

This section introduces concepts that you will need to understand to write your client application.

Session

A session represents a conversation between a user and the dialog service. For example, consider the following scenario for a coffee app:

The interactions between the client application and the dialog service for this scenario occur in the same session. A session is identified by a session ID. Each request and response exchanged between the client app and the dialog service for that specific conversation must include that session ID.

For more information on session IDs, see Step 3. Start conversation.

Playing messages and providing user input

The client application is responsible for playing messages to the user (for example, "What can I do for you today?") and for collecting and returning the user input to the dialog service (for example, "I want a cappuccino").

Messages can be provided to the user in the form of:

The client app can then send the user input to the dialog service in a few ways:

Stream audio to the Dialog service

You can now use the DLGaaS API to stream audio and perform recognition on a user input. This allows you to interact with the Nuance ASR (Automatic Speech Recognition) service without having to use the ASRaaS API.

When audio is sent, DLGaaS streams it to ASRaaS, which performs recognition. The recognized content is then sent to NLUaaS for interpretation, which is then used by the dialog application.

Nodes and actions

Mix.dialog nodes that trigger a call to the DLGaaS API Mix.dialog nodes

You create applications in Mix.dialog using nodes. Each node performs a specific task, such as asking a question, playing a message, and performing recognition. As you add nodes and connect them to one another, the dialog flow takes shape in the form of a graph.

At specific points in the dialog, when the dialog service requires input from the client application, it sends an action to the client app. In the context of DLGaaS, the following Mix.dialog nodes trigger a call to the DLGaaS API and send a corresponding action:

Question and answer

The objective of the question and answer node is to collect user input. It sends a message to the client application and expects user input, which can be audio, a text utterance, or an interpretation. For example, in the coffee app, the dialog may tell the client app to ask the user "What type of coffee would you like today" and then to return the user's answer.

The message specified in a question and answer node is sent to the client application as a question and answer action. To continue the flow, the client application must then return the user input to the question and answer node.

See Question and answer actions for details.

Data access

The data access node tells the client app that the dialog expects data to continue the flow. It can also be used to exchange information between the client app and the dialog. For example, in a coffee app, the dialog may ask the client application to query the price of the order or to retrieve the name of the user.

Data is sent to the client application in a data access action. To continue the flow, the client application must return the requested data.

See Data access actions for details.

External actions: Transfer and End

There are two types of external actions nodes:

Message node

The message node plays a message. The message specified in a message node is sent to the client application as a message action.

See Message actions for details.

Session data

In some situations, you may want to send data from the client application to the dialog service to be used during the session. For example, at the beginning of a dialog you might want to send the geographical location of the user, the user name and phone number, and so on.

For more information, see Exchanging session data .

Selectors

Most dialog applications can support multiple channels and languages, so you need to select which channel and language to use for an interaction in your API. This is done through a selector.

A selector is the combination of:

You do not need to send the selector at each interaction. If the selector is not included, the values of the previous interaction will be used.

Prerequisites from Mix

Before developing your gRPC application, you need a Mix project that provides a dialog application as well as authorization credentials.

  1. Create a Mix project:
  2. Generate a "secret" and client ID of your Mix project: see Obtaining authentication for services. Later you will use these credentials to request an authorization token to run your application.
  3. Learn the URL to call the Dialog service: see Accessing a runtime service.
    • For DLGaaS, this is: dlg.api.nuance.com:443

gRPC setup

Get proto files

# For DLGaaS
\nuance\dlg\v1\dlg_interface.proto
\nuance\dlg\v1\dlg_messages.proto
\nuance\dlg\v1\common\dlg_common_messages.proto

#For ASRaaS audio streaming
\nuance\asr\v1\recognizer.proto
\nuance\asr\v1\resource.proto
\nuance\asr\v1\result.proto

#For TTSaaS streaming
\nuance\tts\v1\nuance_tts.proto

Install gRPC for programming language, e.g. Python

$ pip install --upgrade pip
$ pip install grpcio
$ pip install grpcio-tools

For Python, use protoc to generate stubs

$ echo "Pulling support files"
$ mkdir -p google/api
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/annotations.proto > google/api/annotations.proto
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/http.proto > google/api/http.proto
$ echo "generate the stubs for support files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
$ echo "generate the stubs for the DLGaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_interface.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_messages.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/common/dlg_common_messages.proto
$ echo "generate the stubs for the ASRaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/recognizer.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/resource.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/result.proto
$ echo "generate the stubs for the TTSaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/tts/v1/nuance_tts.proto

The basic steps in using the Dialog as a Service gRPC protocol are:

  1. Download the gRPC .proto files here. These files contain a generic version of the functions or classes that can interact with the dialog service.
    See Note about packaged proto files below.

  2. Install gRPC for the programming language of your choice, including C++, Java, Python, Go, Ruby, C#, Node.js, and others. See gRPC Documentation for a complete list and instructions on using gRPC with each language.

  3. Generate client stub files in your programming language from the proto files. Depending on your programming language, the stubs may consist of one file or multiple files per proto file.

    These stub files contain the methods and fields from the proto files as implemented in your programming language. You will consult the stubs in conjunction with the proto files. See gRPC API.

  4. Write your client app, referencing the functions or classes in the client stub files. See Client app development for details and a scenario.

Note about packaged proto files

This release of DLGaaS includes a feature that allows you to perform ASR and TTS using the DLGaaS API. This feature requires that you have installed the ASR and TTS proto files. For your convenience, these files are packaged with the DLGaaS proto files available here, and this documentation provides instructions for generating the stub files.

As such, the following files are packaged with this documentation:

Client app development

This section describes the main steps in a typical client application that interacts with a Mix.dialog application. In particular, it provides an overview of the different methods and messages used in a sample order coffee application.

Sample dialog exchange

To illustrate how to use the API, this document uses the following simple dialog exchange between an end user and a dialog application:

Overview

The DialogService is the main entry point to the Nuance Dialog service.

A typical workflow for accessing a dialog application at runtime is as follows:

  1. The client application requests the access token from the Mix authentication service.
  2. The client application opens a secure channel using the access token.
  3. The client application initiates a new conversation using the StartRequest method of the DialogService. The service returns a session ID, which is used at each interaction to keep the same conversation.
  4. As the user interacts with the dialog, the client application invokes one of the following methods, as often as necessary:
    • The ExecuteRequest method for text input and data exchange.
      An ExecuteResponse is returned to the client application when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow.
    • The StreamInput method for audio input (ASR) and/or audio output (TTS).
      A StreamOutput is returned to the client application.
  5. The client application closes the conversation using the StopRequest method.

This workflow is shown in the following high-level sequence flow:

(Click the image for a close-up view)

For a detailed sequence flow diagram, see Detailed sequence flow.

Step 1. Generate token

Save token to file: gen-token.sh

#!/bin/bash

# Remember to change the colon (:) in your CLIENT_ID to code %3A
CLIENT_ID="appID%3ANMDPTRIAL_your_name_nuance_com_20190919T190532565840"
SECRET="5JEAu0YSAjV97oV3BWy2PRofy6V8FGmywiUbc0UfkGE"
curl -s -u "$CLIENT_ID:$SECRET" "https://auth.crt.nuance.com/oauth2/token" \
-d 'grant_type=client_credentials' -d 'scope=dlg' \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])' \
> my-token.txt

Save token to variable: gen-token-var.sh

#!/bin/bash

CLIENT_ID="appID%3ANMDPTRIAL_your_name_nuance_com_20190919T190532565840"
SECRET="5JEAu0YSAjV97oV3BWy2PRofy6V8FGmywiUbc0UfkGE"
MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" "https://auth.crt.nuance.com/oauth2/token" \
-d 'grant_type=client_credentials' -d 'scope=dlg' \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'`"

Nuance Mix uses the OAuth 2.0 protocol for authentication. To call the Dialog runtime service, your client application must request and then provide an access token. The token expires after a short period of time so must be regenerated frequently.

Your client application uses the client ID and secret from the Mix.dashboard to generate an authentication token from the Mix Authentication Service, available at the following URL:

https://auth.crt.nuance.com/oauth2/token

Depending on how your application expects the token, you can include the generation code within your application or create a script that saves the token in a variable or file:

The curl command in these scripts generates a JSON object including the access_token field that contains the token, then uses Python tools to extract the token from the JSON. The resulting environment variable contains only the token.

In this scenario, the colon (:) in the client ID must be changed to the code %3A so curl can parse the value correctly:

appID:NMDPTRIAL_your_name_nuance_com_20190919T190532565840  
-->  
appID%3ANNMDPTRIAL_your_name_nuance_com_20190919T190532565840

Once you have created the script file, you can run it before running your application. See Sample Python app for an example.

Step 2. Authenticate the service

def create_channel(args):    
    log.debug("Adding CallCredentials with token %s" % args.token)
    call_credentials = grpc.access_token_call_credentials(args.token)

    log.debug("Creating secure gRPC channel")
    channel_credentials = grpc.ssl_channel_credentials()
    channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
    channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)

    return channel

You authenticate the service by creating a secure gRPC channel, providing:

Step 3. Start conversation

def start_request(stub, model_ref_dict, session_id, selector_dict={}, timeout):
    selector = Selector(channel=selector_dict.get('channel'), 
                        library=selector_dict.get('library'),
                        language=selector_dict.get('language'))
    start_payload = StartRequestPayload(model_ref=model_ref_dict)
    start_req = StartRequest(session_id=session_id, 
                        selector=selector, 
                        payload=start_payload,
                        session_timeout_sec=timeout)
    log.debug(f'Start Request: {start_req}')
    start_response, call = stub.Start.with_call(start_req)
    response = MessageToDict(start_response)
    log.debug(f'Start Request Response: {response}')
    return response, call

To start a new conversation, the client app sends a StartRequest message with the following information:

A new unique session ID is generated and returned as a response; for example:


'payload': {'sessionId': 'b8cba63a-f681-11e9-ace9-d481d7843dbd'}

The client app must then use the same session ID in all subsequent requests that apply to this conversation.

Additional notes on session IDs

Step 4a. Interact with the user (text input)

def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
                        library=selector_dict.get('library'),
                        language=selector_dict.get('language'))
    input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
    execute_payload = ExecuteRequestPayload(
                        user_input=input)
    execute_request = ExecuteRequest(session_id=session_id, 
                        selector=selector, 
                        payload=execute_payload)
    log.debug(f'Execute Request: {execute_payload}')
    execute_response, call = stub.Execute.with_call(execute_request)
    response = MessageToDict(execute_response)
    log.debug(f'Execute Response: {response}')
    return response, call

Interactions that use text input and do not require streaming are done through multiple ExecuteRequest calls, providing the following information:

The dialog runtime app returns the Execute response payload when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow. This payload provides the actions to be performed by the client application.

There are many types of actions that can be requested by the dialog application:

For example, the following question and answer action indicates that the message "Hello! How can I help you today?" must be displayed to the user:

Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.



"payload": {
    "messages": [],
    "qaAction": {
        "message": {
            "nlg": [],
            "visual": [{
                    "text": "Hello! How can I help you today?"
                }
            ],
            "audio": []
        }
    }
}

A question and answer node expects input from the user to continue the flow. This can be provided as text (either to be interpreted by Nuance or as already interpreted input) in the next ExecuteRequest call. To provide the user input as audio, use the StreamInput request, as described in Step 4b.

Step 4b. Interact with the user (using audio)

def execute_stream_request(args, stub, session_id, selector_dict={}):
    execute_responses = stub.ExecuteStream(build_stream_input(args, session_id, selector_dict))
    log.debug(f'execute_responses: {execute_responses}')
    responses = []
    audio = bytearray(b'')

    for execute_response in execute_responses:
        if execute_response:
            response = MessageToDict(execute_response.response)
            if response: responses.append(response)
        audio += execute_response.audio.audio
    return responses, audio

def build_stream_input(args, session_id, selector_dict):
    selector = Selector(channel=selector_dict.get('channel'), 
                        library=selector_dict.get('library'),
                        language=selector_dict.get('language'))

    try:
        with open(args.audioFile, mode='rb') as file:
            audio_buffer = file.read()

        # Hard code packet_size_byte for simplicity sake (approximately 100ms of 16KHz mono audio)
        packet_size_byte = 3217
        audio_size = sys.getsizeof(audio_buffer)
        audio_packets = [ audio_buffer[x:x + packet_size_byte] for x in range(0, audio_size, packet_size_byte) ]

        # For simplicity sake, let's assume the audio file is PCM 16KHz
        user_input = None
        asr_control_v1 = {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}

    except:
        # Text interpretation as normal
        asr_control_v1 = None
        audio_packets = [b'']
        user_input = UserInput(user_text=args.textInput)

    # Build execute request object
    execute_payload = ExecuteRequestPayload(user_input=user_input)
    execute_request = ExecuteRequest(session_id=session_id, 
                                     selector=selector, 
                                     payload=execute_payload)

    # For simplicity sake, let's assume the audio file is PCM 16KHz
    tts_control_v1 = {'audio_params': {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}}
    first_packet = True
    for audio_packet in audio_packets:
        if first_packet:
            first_packet = False

            # Only first packet should include the request header
            stream_input = StreamInput(
                request=execute_request,
                asr_control_v1=asr_control_v1,
                tts_control_v1=tts_control_v1,
                audio=audio_packet
                )
            log.debug(f'Stream input initial: {stream_input}')
        else:
            stream_input = StreamInput(audio=audio_packet)

        yield stream_input


Interactions with the user that require audio streaming are done through multiple StreamInput calls.

The StreamInput method can be used to:

The StreamInput method has the following fields:

This method returns a StreamOutput, which has the following fields:

This can be implemented as follows in your application:

To perform speech recognition on audio input

The workflow to perform speech recognition on audio input is as follows:

  1. The dialog service sends an ExecuteResponse with a question and answer action, indicating that it requires user input.
  2. The client application sends a first StreamInput method with the asr_control_v1 and request parameters to DLGaaS; this lets DLGaaS know to expect audio.
  3. The client application sends additional StreamInputs to stream the audio.
  4. The client application sends an empty StreamInput to indicate end of audio.
    The audio is recognized, interpreted, and returned to the dialog application, which continues its flow.
  5. The dialog service returns the corresponding ExecuteResponse in a single StreamOutput.

This can be seen in the detailed sequence flow. For example, assuming that the user says "I want an espresso", the client application will send a series of StreamInput methods with the following content:


# First StreamInput
{
    "request": {
        "session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
        "selector": {
            "channel": "default"
            "language": "en-US"
            "library": "default"
        },
        "payload": {}
    },
    "asr_control_v1": {
        "audio_format": {
            "pcm": {
                "sample_rate_hz": 16000
            }
        }
    },
    "audio": "RIFF4\373\000\00..."
}

# Additional StreamInputs with audio bytes
{
    "audio": "...audio_bytes..."
}

# Final empty StreamInput to indicate end of audio
{

}

Once audio has been recognized, interpreted, and handled by DLGaaS, the following StreamOutput is returned:


# StreamOutput

{
    "response": {
        "payload": {
            "messages": [],
            "qaAction": {
                "message": {
                    "nlg": [{
                            "text": "What size coffee would you like? "
                        }
                    ],
                    "visual": [{
                            "text": "What size coffee would you like?"
                        }
                    ],
                    "audio": [] // This is a reference to an audio file.
                }
            }
        }
    }
}

To synthesize an output message into audio using TTS

  1. The client application sends the StreamInput method with the tts_control_v1 and request parameters to DLGaaS.
    The dialog application continues the dialog according to the ExecuteRequest provided in the request parameter.
  2. If the corresponding ExecuteResponse includes a TTS message (that is, a message is provided in the nlg field of the message action), this message is synthesized and the audio is streamed back to the application in a series of StreamOutput calls.

For example, assuming that the user typed "I want an espresso", the client application will send a single StreamInput method with the following content:


# StreamInput
{
    "request": {
        "session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
        "selector": {
            "channel": "default"
            "language": "en-US"
            "library": "default"
        },
        "payload": {
            "user_input": {
                "userText": "I want an espresso"
            }
        },
    },
    "tts_control_v1": {
        "audio_params": {
            "audio_format": {
                "pcm": {
                    "sample_rate_hz": 16000
                }
            }
        }
    }
}

Once user text has been interpreted and handled by DLGaaS, the following series of StreamOutput is returned:

Note: The StreamOutput includes the audio field because a TTS message was defined (as shown in the nlg field). If no TTS message was specified, no audio would have been returned.


# First StreamOutput

{
    "response": {
        "payload": {
            "messages": [],
            "qaAction": {
                "message": {
                    "nlg": [{
                            "text": "What size coffee would you like? "
                        }
                    ],
                    "visual": [{
                            "text": "What size coffee would you like?"
                        }
                    ],
                    "audio": []
                }
            }
        }
    },
    "audio": "RIFF4\373\000\00.."
}

# Additional StreamOutputs with audio bytes
{
    "audio": "...audio_bytes..."
}


To perform both speech recognition and TTS in a single call

  1. The client application sends the StreamInput method with the asr_control_v1, tts_control_v1, and request parameters to DLGaaS; this lets DLGaaS know to expect audio.
  2. The client application streams the audio with the StreamInput method.
    The audio is recognized, interpreted, and returned to the dialog application, which continues its flow. If the corresponding ExecuteResponse includes a TTS message, this message is synthesized and the audio is streamed back to the application in a series of StreamOutput calls.

Note about performing speech recognition and TTS in a dialog application

The speech recognition and TTS features provided as part of the DLGaaS API should be used in relation to your Mix.dialog, that is:

To perform speech recognition or TTS outside of a Mix.dialog, please use the following services:

Step 5. Stop conversation

def stop_request(stub, session_id=None):
    stop_req = StopRequest(session_id=session_id)
    log.debug(f'Stop Request: {stop_req}')
    stop_response, call = stub.Stop.with_call(stop_req)
    response = MessageToDict(stop_response)
    log.debug(f'Stop Response: {response}')
    return response, call

To stop the conversation, the client app sends the StopRequest message; this message has the following fields:

The StopRequest message removes the session state, so the session ID for this conversation should not be used in the short term for any new interactions, to prevent any confusion when analyzing logs.

Detailed sequence flow

Detailed sequence flow

Sample Python app

import argparse
import logging

import uuid

from google.protobuf.json_format import MessageToJson, MessageToDict

from grpc import StatusCode

from nuance.dlg.v1.common.dlg_common_messages_pb2 import *
from nuance.dlg.v1.dlg_messages_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2_grpc import *

log = logging.getLogger(__name__)

def parse_args():
    parser = argparse.ArgumentParser(
        prog="dlg_client.py",
        usage="%(prog)s [-options]",
        add_help=False,
        formatter_class=lambda prog: argparse.HelpFormatter(
            prog, max_help_position=45, width=100)
    )

    options = parser.add_argument_group("options")
    options.add_argument("-h", "--help", action="help",
                         help="Show this help message and exit")
    options.add_argument("--token", nargs="?", help=argparse.SUPPRESS)
    options.add_argument("-s", "--serverUrl", metavar="url", nargs="?",
                         help="Dialog server URL, default=localhost:8080", default='localhost:8080')
    options.add_argument('--modelUrn', nargs="?",
                         help="Dialog App URN, e.g. urn:nuance-mix:tag:model/A2_C16/mix.dialog")
    options.add_argument("--textInput", metavar="file", nargs="?",
                         help="Text to preform interpretation on")

    return parser.parse_args()

def create_channel(args):    
    log.debug("Adding CallCredentials with token %s" % args.token)
    call_credentials = grpc.access_token_call_credentials(args.token)

    log.debug("Creating secure gRPC channel")
    channel_credentials = grpc.ssl_channel_credentials()
    channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
    channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)

    return channel

def read_session_id_from_response(response_obj):
    try:
        session_id = response_obj.get('payload').get('sessionId', None)
    except Exception as e:
        raise Exception("Invalid JSON Object or response object")
    if session_id:
        return session_id
    else:
        raise Exception("Session ID is not present or some error occurred")


def start_request(stub, model_ref_dict, session_id, selector_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
                        library=selector_dict.get('library'),
                        language=selector_dict.get('language'))
    start_payload = StartRequestPayload(model_ref=model_ref_dict)
    start_req = StartRequest(session_id=session_id, 
                        selector=selector, 
                        payload=start_payload)
    log.debug(f'Start Request: {start_req}')
    start_response, call = stub.Start.with_call(start_req)
    response = MessageToDict(start_response)
    log.debug(f'Start Request Response: {response}')
    return response, call

def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
                        library=selector_dict.get('library'),
                        language=selector_dict.get('language'))
    input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
    execute_payload = ExecuteRequestPayload(
                        user_input=input)
    execute_request = ExecuteRequest(session_id=session_id, 
                        selector=selector, 
                        payload=execute_payload)
    log.debug(f'Execute Request: {execute_payload}')
    execute_response, call = stub.Execute.with_call(execute_request)
    response = MessageToDict(execute_response)
    log.debug(f'Execute Response: {response}')
    return response, call

def stop_request(stub, session_id=None):
    stop_req = StopRequest(session_id=session_id)
    log.debug(f'Stop Request: {stop_req}')
    stop_response, call = stub.Stop.with_call(stop_req)
    response = MessageToDict(stop_response)
    log.debug(f'Stop Response: {response}')
    return response, call

def main():
    args = parse_args()
    log_level = logging.DEBUG
    logging.basicConfig(
        format='%(asctime)s %(levelname)-5s: %(message)s', level=log_level)
    with create_channel(args) as channel:
        stub = DialogServiceStub(channel)
        model_ref_dict = {
            "uri": args.modelUrn,
            "type": 0
        }
        selector_dict = {
            "channel": "default",
            "language": "en-US",
            "library": "default"
        }
        response, call = start_request(stub, 
                            model_ref_dict=model_ref_dict, 
                            session_id=None,
                            selector_dict=selector_dict
                        )
        session_id = read_session_id_from_response(response)
        log.debug(f'Session: {session_id}')
        assert call.code() == StatusCode.OK
        log.debug(f'Initial request, no input from the user to get initial prompt')
        payload_dict = {
            "user_input": {
                "userText": None
            }
        }
        response, call = execute_request(stub, 
                            session_id=session_id, 
                            selector_dict=selector_dict,
                            payload_dict=payload_dict
                        )
        assert call.code() == StatusCode.OK
        log.debug(f'Second request, passing in user input')
        payload_dict = {
            "user_input": {
                "userText": args.textInput
            }
        }
        response, call = execute_request(stub, 
                            session_id=session_id, 
                            selector_dict=selector_dict,
                            payload_dict=payload_dict
                        )
        assert call.code() == StatusCode.OK
        response, call = stop_request(stub, 
                            session_id=session_id
                        )
        assert call.code() == StatusCode.OK

if __name__ == '__main__':
    main()

This sample code shown in this section is for a Python client application that takes on the following command-line options:


usage: dlg_client.py [-options]

options:
  -h, --help                   Show this help message and exit
  -s [url], --serverUrl [url]  Dialog server URL, e.g., dlg.api.nuance.com:443
  --token $TOKEN               Access token for authentication
  --modelUrn [MODELURN]        Dialog App URN, e.g. urn:nuance-mix:tag:model/coffee_app/mix.dialog
  --textInput [text]           Text to interpret

To run this sample application:

Step 1. Download the gRPC .proto files here and unzip the files.

Step 2. Install the required dependencies:


$ python3 -m venv env
$ source env/bin/activate    
$ pip install --upgrade pip
$ pip install grpcio
$ pip install grpcio-tools
$ pip install uuid

Step 3. Generate the stubs:


$ echo "Pulling support files"
$ mkdir -p google/api
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/annotations.proto > google/api/annotations.proto
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/http.proto > google/api/http.proto
$ echo "generate the stubs for support files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
$ echo "generate the stubs for the DLGaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_interface.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_messages.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/common/dlg_common_messages.proto
$ echo "generate the stubs for the ASRaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/recognizer.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/resource.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/result.proto
$ echo "generate the stubs for the TTSaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/tts/v1/nuance_tts.proto

Step 4. Generate a token (see Step 1: Generate token).

Step 5. Download the dlg_client.py sample app here.

Step 6. Run the application, passing it the location of the Dialog as a service, your token file, the URN, and a text to interpret. For example:

$ python dlg_client.py \
--serverUrl dlg.api.nuance.com:443 \
--token $TOKEN \
--modelUrn "urn:nuance-mix:tag:model/coffee_app/mix.dialog" \
--textInput "I want a double espresso"

You may instead incorporate the token-generation code within the application, reading the credentials from a configuration file.

Reference topics

This section provides more detailed information about objects used in the gRPC API.

Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.

Status messages and codes

gRPC error codes

In addition to the standard gRPC error codes, DLGaaS uses the following codes:

gRPC code Message Indicates
0 OK Normal operation
5 NOT FOUND The resource specified could not be found; for example:
  • No session corresponding to the sessionID specified
  • No model found for the URN specified
  • Incorrect language code specified
  • Incorrect channel name specified

Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax.
11 OUT_OF_RANGE The provided session timeout is not in the expected range.

Troubleshooting: Specify a value between 0 and 14400 (default is 900) and try again.
12 UNIMPLEMENTED The API version was not found or is not available on the URL specified. For example, a client using DLGaaS v1 is trying to access the dlgaas.beta.nuance.com URL.

Troubleshooting: See URLs to runtime services for the supported URLs.
13 INTERNAL There was an issue on the server side or interactions between sub systems have failed.

Troubleshooting: Contact Nuance.
16 UNAUTHENTICATED The credentials specified are incorrect or expired.

Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Obtaining authentication for services. Note that the token needs to be regenerated regularly. See Access token lifetime for details.

HTTP return codes

In addition to the standard HTTP error codes, DLGaaS uses the following codes:

HTTP code Message Indicates
200 OK Normal operation
401 UNAUTHORIZED The credentials specified are incorrect or expired.

Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Obtaining authentication for services. Note that the token needs to be regenerated regularly. See Access token lifetime for details.
404 NOT_FOUND The resource specified could not be found; for example:
  • No session corresponding to the sessionID specified
  • No model found for the URN specified
  • The path of the HTTP endpoint includes a typo (for example, incorrect version)

Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax.
500 INTERNAL_SERVER_ERROR There was an issue on the server side.
Troubleshooting: Contact Nuance.

Examples

Incorrect URN

"grpc_message":"model [urn:nuance:mix/eng-USA/coffee_app_typo/mix.dialog] could not be found","grpc_status":5

Incorrect channel

"grpc_message":"channel is invalid, supported values are [Omni Channel VA, default] (error code: 5)","grpc_status":5}"

Session not found

"grpc_message":"Could not find session for [12345]","grpc_status":5}"

Incorrect credentials

"{"error":{"code":401,"status":"Unauthorized","reason":"Token is expired","message":"Access credentials are invalid"}\n","grpc_status":16}"

Message actions

Example message action as part of QA Action

{
  "payload": {
    "messages": [],
    "qaAction": {
      "message": {
        "nlg": [{
            "text": "What type of coffee would you like?"
          }
        ],
        "visual": [{
            "text": "What <b>type</b> of coffee would you like? For the list of options, see the <a href=\"www.myserver.com/menu.html\">menu</a>."
          }
        ],
        "audio": [{
            "text": "What type of coffee would you like? "
          }
        ]
      }
    }
  }
}

A message action indicates that a message should be played to the user. A message can be provided as:

Message actions can be configured in the following Mix.dialog nodes:

Message nodes

A message node is used to play or display a message. The message specified in a message node is sent to the client application as a message action. A message node also performs non-recognition actions, such as playing a message, assigning a variable, or defining the next node in the dialog flow.

Messages configured in a message node are cumulative and sent only when a question and answer node, a data access node, or an external actions node occurs in the dialog flow. For example, consider the following dialog flow:

multiple messages

This would be handled as follows:

  1. The Dialog service sends an ExecuteResponse when encountering the question and answer node, with the following messages:
     
    # First ExecuteResponse
    {
    "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Hey there!"
          }
        ],
        "audio": []
      }, {
        "nlg": [],
        "visual": [{
            "text": "Welcome to the coffee app."
          }
        ],
        "audio": []
      }
    ],
    "qaAction": {
      "message": {
        "nlg": [],
        "visual": [{
            "text": "What can I do for you today?"
          }
        ],
        "audio": []
      }
    }
    }
    }
    
  2. The client application sends an ExecuteRequest with the user input.
  3. The Dialog service sends an ExecuteResponse when encountering the end node, with the following message action:
    
    # Second ExecuteResponse
    {
    "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Goodbye."
          }
        ],
        "audio": []
      }
    ],
    "endAction": {}
    }
    }
    

Using variables in messages

Messages can include variables. For example, in a coffee application, you might want to personalize the greeting message:

"Hello Miranda ! What can I do for you today?"

Variables are configured in Mix.dialog. They are resolved by the dialog engine and then returned to the client application. For example:

 
{
    "payload": {
        "messages": [],
        "qaAction": {
            "message": {
                "nlg": [],
                "visual": [
                    {
                        "text": "Hello Miranda ! What can I do for you today?"
                    }
                ],
                "audio": []
            }
        }
    }
}
   

Question and answer actions

A question and answer action is returned by a question and answer node. A question and answer node is the basic node type in dialog applications. It first plays a message and then recognizes user input.

The message specified in a question and answer node is sent to the client application as a message action.

The client application must then return the user input to the question and answer node. This can be provided in three ways:

In a question and answer node, the dialog flow is stopped until the client application has returned the user input.

Sending data

A question and answer node can specify data to send to the client application. This data is configured in Mix.dialog, in the Send Data tab of the question and answer node. For the procedure, see Send data to the client application in the Mix.dialog documentation.

For example, in the coffee application, you might want to send entities that you have collected in a previous node (COFFEE_TYPE and COFFEE_SIZE) as well as data that you have retrieved from an external system (the user's rewards card number):

Send Data tab

This data is sent to the client application in the data field of the QAAction; for example:


{
    "payload": {
        "messages": [],
        "qaAction": {
            "message": {
                "nlg": [],
                "visual": [
                    {
                        "text": "Your order was processed. Would you like anything else today?"
                    }
                ],
                "audio": [],
                "view": {
                    "id": "",
                    "name": ""
                }
            },
            "data": {
                "rewardsCard": "5367871902680912",
                "COFFEE_TYPE": "espresso",
                "COFFEE_SIZE": "lg"
            }
        }
    }
}
   

Interactive elements

Question and answer actions can include interactive elements to be displayed by the client app, such as clickable buttons or links.

For example, in a web version of the coffee application, you may want to display Yes/No buttons so that users can confirm their selections:

confirmation_buttons

Interactive elements are configured in Mix.dialog in question and answer nodes. For the procedure, see Define interactive elements in the Mix.dialog documentation.

For example, for the Yes/No buttons scenario above, you could configure two elements, one for each button, as follows:

confirmation_buttons

This information is sent to the client app in the selectable field of the QAAction. For example:

 
{
    "payload": {
        "messages": [],
        "qaAction": {
            "message": {
                "nlg": [],
                "visual": [{
                        "text": "So you want a double espresso , is that it?"
                    }
                ],
                "audio": []
            },
            "selectable": {
                "selectableItems": [{
                        "value": {
                            "id": "answer",
                            "value": "yes"
                        },
                        "description": "Image of green checkmark",
                        "displayText": "Yes",
                        "displayImageUri": "/resources/images/green_checkmark.png"
                    }, {
                        "value": {
                            "id": "answer",
                            "value": "no"
                        },
                        "description": "Image of Red X",
                        "displayText": "No",
                        "displayImageUri": "/resources/images/red_x.png"
                    }
                ]
            }
        }
    }
}

The application is then responsible for displaying the elements (in this case, the two buttons) and for returning the choice made by the user in the selected_item field of the Execute Request payload. For example:

 
"payload": {
    "user_input": {
        "selected_item": {
            "id": "answer",
            "value": "no"
        }
    }
}

Data access actions

A data access action tells the client app that the dialog expects data to continue the flow. For example, consider these use cases:

Data access actions are configured in data access nodes. These nodes specify:

Using the data access API in the client app

Data access information is sent and received as follows:

For example, in the coffee app use case, if a user says "I want a double espresso," the dialog will send this data access action information to the client application in the ExecuteResponsePayload:

 
{
  "payload": {
    "messages": [],
    "daAction": {
      "id": "get_coffee_price",
      "data": {
        "COFFEE_TYPE": "espresso",
        "COFFEE_SIZE": "lg"
      }
    }
  }
}


Where:

The client application uses that information to perform the action required by the dialog, in this case fetching the price of the coffee based on the user's choice. It then returns the value in the coffee_price variable as part of the ExecuteRequestPayload, as well as a returnCode:

 
{
  "selector": {
    "channel": "ivr",
    "language": "en-US",
    "library": "default"
  },
  "payload": {
    "requested_data": {
      "id": "get_coffee_price",
      "data": {
        "coffee_price": "4.25",
        "returnCode": "0"
      }
    }
  }
}

The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.

Data access action sequence flow

This sequence diagram shows a data access action exchange. For simplicity, only the payload of the requests and responses related to the data access feature are shown.

data access flow

Transfer actions

An external actions node of type "Transfer" in Mix.dialog sends an Escalation action in the DGLaaS API. This action can be used, for example, to escalate to an IVR agent. Any data set in the Transfer node is sent as part of the Escalation action data field.

To continue the flow, the client application must return data in the requested_data field of the ExecuteRequestPayload. At a minimum, this data must include a returnCode. It can also include data requested by the dialog, if any. The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.

For example, consider a scenario where the Transfer action is used to escalate to an agent to confirm a customer's data, as shown in the following Mix.dialog node:

Transfer actions

This transfer action sends the userName and userID variables to the client application in an escalationAction, as follows:


{
    "payload": {
        "messages": [],
        "escalationAction": {
            "data": {
                "userName": "Miranda Smith",
                "userID": "MIRS82734"
            },
            "id": "TransferToAgent"
        }
    }
}

The client application transfers the call and then returns a returnCode to the dialog to provide the status of the transaction. If the transfer was successful, a returnCode of "0" returned. For example:


{
    "selector": {
        "channel": "default",
        "language": "en-US",
        "library": "default"
    },
    "payload": {
        "requested_data": {
            "id": "TransferToAgent",
            "data": {
                "returnCode": "0"
            }
        }
    }
}


End actions

An external actions node of type "End" returns an End action, which indicates the end of the dialog. It includes the ID that identifies the node in the Mix.dialog application as well as any data that you set for this node. For example:


{
  "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Perfect, a double espresso coming right up!"
          }
        ],
        "audio": []
      }
    ],
    "endAction": {
      "data": {
        "returnCode": "0"
      },
      "id": "CoffeeApp End node"
    }
  }
}


Interpreting text user input

Example: Interpretation is performed by Nuance

"payload": {
  "user_input": {
    "userText": "I want a large coffee"
  }
}

Example: Interpretation is performed by an external system

"payload": {
  "user_input": {
    "interpretation": {
      "confidence": 1.0,
      "utterance": "I want a large americano",
      "data": {
        "INTENT": "ORDER_COFFEE",
        "COFFEE_SIZE": "LG",
        "COFFEE_TYPE": "americano"
      },
      "slot_literals": {
        "COFFEE_SIZE": "large",
        "COFFEE_TYPE": "americano"
      }
    }
  }
}

Interpretation of user input provided as text can be performed either by:

Exchanging session data

You can use the StartRequest to send data from the client application to the dialog service to be used during the session.

For example, let's say that the user name and preferred coffee are stored on the user's phone, and you'd like to use them in your dialog application to customize your messages:

To implement this scenario:

  1. Create variables in Mix.dialog (for example, user_name and preferred_coffee). See Manage variables in the Mix.dialog documentation for details.
  2. Use the variables in the dialog; for example, the following message node includes the user_name value in the initial prompt:
    variables
  3. Send the values of user_name and preferred_coffee in the StartRequestPayload.

For example, consider the following StartRequest, which provides the values for the user_name and preferred_coffee variables:

 
{
    "selector": {
        "channel": "default",
        "language": "en-US",
        "library": "default"
    },
    "payload": {
        "data": {
            "returnCode": "0",
            "preferred_coffee": "espresso",
            "user_name": "Miranda"
        }
    }
}

The dialog app can then include the user name in the first prompt:

 
{
    "payload": {
        "messages": [],
        "qaAction": {
            "message": {
                "nlg": [],
                "visual": [
                    {
                        "text": "Hello Miranda ! What can I do for you today?"
                    }
                ],
                "audio": []
            }
        }
    }
}

Disabling logging

You can set the suppress_log_user_data in the StartRequestPayload to True to disable logging for ASR, NLU, TTS, and Dialog. This has the following impact:

User ID

You can specify a user ID in the StartRequest, ExecuteRequest, and StopRequest. This user ID is converted into an unreadable format and stored in call logs and user-specific files. It can be used for:

Note: The user_id value can accept any UTF-8 characters.

gRPC API

Dialog as a Service provides three protocol buffer (.proto) files to define the Dialog service for gRPC. These files contain the building blocks of your dialog applications:

Once you have transformed the proto files into functions and classes in your programming language using gRPC tools, you can call these functions from your client application to start a conversation with a user, collect the user's input, obtain the action to perform, and so on.

See Client app development for a scenario using Python that provides an overview of the different methods and messages used in a sample order coffee application. For other languages, consult the gRPC and Protocol Buffer documentation:

Field names in proto and stub files

In this section, the names of the fields are shown as they appear in the proto files. To see how they are generated in your programming language, consult your generated files. For example:

Proto file Python Go Java
session_id session_id SessionId sessionId or getSessionId
selector selector Selector selector or setSelector

For details, see the Protocol Buffers documentation for:

Proto files structure

Structure of DLGaaS proto files

DialogService
        Start
                StartRequest
                StartResponse
        Execute
                ExecuteRequest
                ExecuteResponse
        ExecuteStream
                StreamInput
                StreamOutput
        Stop
                StopRequest
                StopResponse

StartRequest
    session_id
    selector
        channel
        language
        library
    payload
        model_ref
            uri
            type
        data
    session_timeout_sec
    user_id

StartResponse
    payload
        session_id

ExecuteRequest
    session_id
    selector
        channel
        language
        library
    payload
        user_input
            user_text
            interpretation
                confidence
                input_mode
                utterance
                data
                    key
                    value
                slot_literals
                    key
                    value
                slot_confidences
                    key
                    value
                alternative_interpretations
            selected_item
                id
                value
        dialog_event
            type
            message
            event_name
        requested_data
            id
            data
    user_id

ExecuteResponse
    payload
        messages
            nlg
            visual
            audio
                text
                uri
        qa_action
            message
                nlg
                visual
                audio
                    text
                    uri
            data
            view
                id
                name
            selectable
                selectable_items
                    value
                        id
                        value
                    description
                    display_text
                    display_image_uri
        da_action
            id
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
        escalation_action
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
            id
        end_action
            data
            id
        continue_action
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
            id

StreamInput
    request Standard DLGaaS ExecuteRequest
    asr_control_v1
        audio_format
            pcm | alaw | ulaw | opus | ogg_opus
        utterance_detection_mode
            SINGLE | MULTIPLE | DISABLED
        recognition_flags
            auto_punctuate
            filter_profanity
            mask_load_failures
            etc.
        result_type
    audio
    tts_control_v1
        audio_params
            audio_format
            volume_percentage
            speaking_rate_percentage
            etc.
        voice
            name
            model
            etc.

StreamOutput
    response Standard DLGaaS ExecuteResponse
    audio
    asr_result
    asr_status
    asr_start_of_speech

StopRequest
    session_id
    user_id

StopResponse

Proto files

Proto files

DialogService

Name Request Type Response Type Description
Start StartRequest StartResponse Starts a conversation. Returns a StartResponse object.
Execute ExecuteRequest ExecuteResponse Used to continuously interact with the conversation based on end user input or events. Returns an ExecuteResponse object that will contain data related to the dialog interactions and that can be used by the client to interact with the end user.
ExecuteStream StreamInput stream StreamOutput stream Performs recognition on streamed audio using ASRaaS and provides speech synthesis using TTSaaS.
Stop StopRequest StopResponse Ends a conversation and performs cleanup. Returns a StopResponse object.

This service includes:

DialogService
        Start
                StartRequest
                StartResponse
        Execute
                ExecuteRequest
                ExecuteResponse
        ExecuteStream
                StreamInput
                StreamOutput
        Stop
                StopRequest
                StopResponse

StartRequest

Request object used by the Start method.

Field Type Description
session_id string Optional session ID. If not provided then one will be generated.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.StartRequestPayload Payload of the Start request.
session_timeout_sec uint32 Session timeout value (in seconds), after which the session is terminated.
user_id string Identifies a specific user within the application. See User ID.

This method includes:

StartRequest
    session_id
    selector
        channel
        language
        library
    payload
        model_ref
            uri
            type
        data
        suppress_log_user_data
    session_timeout_sec
    user_id

StartResponse

Response object used by the Start method.

Field Type Description
payload common.StartResponsePayload Payload of the Start response.

This method includes:

StartResponse
    payload
        session_id

ExecuteRequest

Request object used by the Execute method.

Field Type Description
session_id string ID for the session.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.ExecuteRequestPayload Payload of the Execute request.
user_id string Identifies a specific user within the application. See User ID.

This method includes:

ExecuteRequest
    session_id
    selector
        channel
        language
        library
    payload
        user_input
            user_text
            interpretation
                confidence
                input_mode
                utterance
                data
                    key
                    value
                slot_literals
                    key
                    value
                slot_confidences
                    key
                    value
                alternative_interpretations
            selected_item
                id
                value
        dialog_event
            type
            message
            event_name
        requested_data
            id
            data
    user_id

ExecuteResponse

Response object used by the Execute method.

Field Type Description
payload common.ExecuteResponsePayload Payload of the Execute response.

This method includes:

ExecuteResponse
    payload
        messages
            nlg
            visual
            audio
                text
                uri
        qa_action
            message
                nlg
                visual
                audio
                    text
                    uri
            data
            view
                id
                name
            selectable
                selectable_items
                    value
                        id
                        value
                    description
                    display_text
                    display_image_uri
        da_action
            id
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
        escalation_action
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
            id
        end_action
            data
            id
        continue_action
            message
                nlg
                visual
                audio
                    text
                    uri
            view
                id
                name
            data
            id

StreamInput

Performs recognition on streamed audio using ASRaaS and requests speech synthesis using TTSaaS.

Field Type Description
request ExecuteRequest Standard DLGaaS ExecuteRequest; used to continue the dialog interactions.
asr_control_v1 AsrParamsV1 Parameters to be forwarded to the ASR service.
audio bytes Audio samples in the selected encoding for recognition.
tts_control_v1 TtsParamsv1 Parameters to be forwarded to the TTS service.

This method includes:

StreamInput
    request Standard DLGaaS ExecuteRequest
    asr_control_v1
        audio_format
            pcm | alaw | ulaw | opus | ogg_opus
        utterance_detection_mode
            SINGLE | MULTIPLE | DISABLED
        recognition_flags
            auto_punctuate
            filter_profanity
            mask_load_failures
            etc.
        result_type
    audio
    tts_control_v1
        audio_params
            audio_format
            volume_percentage
            speaking_rate_percentage
            etc.
        voice
            name
            model
            etc.

StreamOutput

Streams the requested TTS output and returns ASR results.

Field Type Description
response ExecuteResponse Standard DLGaaS ExecuteResponse; used to continue the dialog interactions.
audio nuance.tts.v1.SynthesisResponse TTS output. See the TTSaaS SynthesisResponse documentation for details.
asr_result nuance.asr.v1.Result Output message containing the transcription result, including the result type, the start and end times, metadata about the transcription, and one or more transcription hypotheses. See the ASRaaS Result documentation for details.
asr_status nuance.asr.v1.Status Output message indicating the status of the transcription. See the ASRaaS Status documentation for details.
asr_start_of_speech nuance.asr.v1.StartOfSpeech Output message containing the start-of-speech message. See the ASRaaS StartOfSpeech documentation for details.

This method includes:

StreamOutput
    response Standard DLGaaS ExecuteResponse
    audio
    asr_result
    asr_status
    asr_start_of_speech

StopRequest

Request object used by Stop method.

Field Type Description
session_id string ID for the session.
user_id string Identifies a specific user within the application. See User ID.

This method includes:

StopRequest
    session_id
    user_id

StopResponse

Response object used by the Stop method. Currently empty; reserved for future use.

This method includes:

StopResponse

Fields reference

AsrParamsV1

Parameters to be forwarded to the ASR service. See Step 4b. Interact with the user (using audio) for details.

Field Type Description
audio_format nuance.asr.v1. AudioFormat Audio codec type and sample rate. See the ASRaaS AudioFormat documentation for details.
utterance_detection_mode nuance.asr.v1. EnumUtteranceDetectionMode How end of utterance is determined. Defaults to SINGLE. See the ASRaaS EnumUtteranceDetectionMode documentation for details.
recognition_flags nuance.asr.v1. RecognitionFlags Flags to fine tune recognition. See the ASRaaS RecognitionFlags documentation for details.
result_type nuance.asr.v1.EnumResultType Whether final, partial, or immutable results are returned. See the ASRaaS EnumResultType documentation for details.

ContinueAction

Continue action to be performed by the client application.

Field Type Description
message Message Message to be played as part of the continue action.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the Continue Action node in the dialog application.

DAAction

Data Access action to be performed by the client application.

Field Type Description
id string ID identifying the Data Access node in the dialog application.
message Message Message to be played as part of the Data Access action.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.

DialogEvent

Message used to indicate an event that occurred during the dialog interactions.

Field Type Description
type DialogEvent.EventType Type of event being triggered.
message string Optional message providing additional information about the event.
event_name string Name of custom event. Must be set to the name of the custom event defined in Mix.dialog. See Manage events for details. Applies only when DialogEvent.EventType is set to CUSTOM.

DialogEvent.EventType

The possible event types that can occur on the client side of interactions.

Note: Only NO_INPUT and NO_MATCH are currently supported.

Name Number Description
SUCCESS 0 Everything went as expected.
ERROR 1 An unexpected problem occurred.
NO_INPUT 2 End user has not provided any input.
NO_MATCH 3 End user provided unrecognizable input.
HANGUP 4 End user has hung up. Currently used for IVR interactions.
CUSTOM 5 Custom event. You must set field event_name in DialogEvent to the name of the custom event defined in Mix.dialog.

EndAction

End node, indicates that the dialog has ended.

Field Type Description
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the End Action node in the dialog application.

EscalationAction

Escalation action to be performed by the client application.

Field Type Description
message Message Message to be played as part of the escalation action.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the External Action node in the dialog application.

ExecuteRequestPayload

Payload sent with the Execute request. If both an event and a user input are provided, the event has precedence. For example, if an error event is provided, the input will be ignored.

Field Type Description
user_input UserInput Input provided to the Dialog engine.
dialog_event DialogEvent Used to pass in events that can drive the flow. Optional; if an event is not passed, the operation is assumed to be successful.
requested_data RequestData Data that was previously requested by engine.

ExecuteResponsePayload

Payload returned after the Execute method is called. Specifies the action to be performed by the client application.

Field Type Description
messages Message Repeated. Message action to be performed by the client application.
qa_action QAAction Question and answer action to be performed by the client application.
da_action DAAction Data access action to be performed by the client application.
escalation_action EscalationAction Escalation action to be performed by the client application.
end_action EndAction End action to be performed by the client application.
continue_action ContinueAction Continue action to be performed by the client application. Currently not implemented

Message

Specifies the message to be played to the user. See Message actions for details.

Field Type Description
nlg Message.Nlg Repeated. Text to be played using Text-to-speech.
visual Message.Visual Repeated. Text to be displayed to the user (for example, in a chat).
audio Message.Audio Repeated. Prompt to be played from an audio file.
view View View details for this message.

Message.Audio

Field Type Description
text string Text of the prompt to be played.
uri string Uri to the audio file.

Message.Nlg

Field Type Description
text string Text to be played using Text-to-speech.

Message.Visual

Field Type Description
text string Text to be displayed to the user (for example, in a chat).

QAAction

Question and answer action to be performed by the client application.

Field Type Description
message Message Message to be played as part of the question and answer action.
data google.protobuf.Struct Map of data exchanged in this node.
view View View details for this action.
selectable Selectable Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.

RequestData

Data that was requested by the dialog application.

Field Type Description
id string ID used by the dialog application to identify which node requested the data.
data google.protobuf.Struct Map of keys to json objects of the data requested.

ResourceReference

Reference object of the resource to use for the request (for example, URN or URL of the model)

Field Type Description
uri string Reference (for example, the URL or URN).
type ResourceReference. EnumResourceType Type of resource.

ResourceReference.EnumResourceType

Name Number Description
APPLICATION_MODEL 0 Dialog application model.

Selectable

Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.

Field Type Description
selectable_items Selectable.SelectableItem Repeated. List of interactive elements.

Selectable.SelectableItem

Field Type Description
value Selectable.SelectableItem. SelectedValue Key-value pairs of available options for interactive element.
description string Description of the interactive element.
display_text string Text to display for this interactive element.
display_image_uri string URI of image to display for this interactive element.

Selectable.SelectableItem.SelectedValue

Field Type Description
id string ID of option.
value string Value of option.

Selector

Provides channel and language used for the conversation.

Field Type Description
channel string Optional: Channel that this conversation is going to use (for example, WebVA).
language string Optional: Language to use for this conversation.
library string Optional: Library to use for this conversation. Advanced customization reserved for future use.

StartRequestPayload

Payload sent with the Start request.

Field Type Description
model_ref ResourceReference Reference object of the resource to use for the request.
data google.protobuf.Struct Map of data sent in the request.
suppress_log_user_data bool Set to true to disable logging for ASR, NLU, TTS, and Dialog.

StartResponsePayload

Payload returned after the Start method is called. If a session ID is not provided in the request, a new one is generated and should be used for subsequent calls.

Field Type Description
session_id string Returns session ID to use for subsequent calls.

TtsParamsv1

Parameters to be forwarded to the TTS service. See Step 4b. Interact with the user (using audio) for details.

Field Type Description
audio_params nuance.tts.v1.
AudioParameters
Output audio parameters, such as encoding and volume. See the TTSaaS AudioParameters documentation for details.
voice nuance.tts.v1.Voice The voice to use for audio synthesis. See the TTSaaS Voice documentation for details.

UserInput

Provides input to the Dialog engine. The client application sends either the text collected from the user, to be interpreted by Mix, or an interpretation that was performed externally.

Field Type Description
user_text string Text collected from end user.
interpretation UserInput.Interpretation Interpretation that was done externally (e.g., NR for vxml).
selected_item Selectable.SelectableItem.
SelectedValue
Value of element selected by end user.

UserInput.Interpretation

Sends interpretation data.

Field Type Description
confidence float Required: Value from 0..1 that indicates the confidence of the interpretation.
input_mode string Optional: Input mode. Current values are dtmf/voice (but input mode not limited to these).
utterance string Raw collected text.
data UserInput.Interpretation.
DataEntry
Repeated. Data from the interpretation of intents and entities. For example, INTENT:BILL_PAY or or AMOUNT:100.
slot_literals UserInput.Interpretation.
SlotLiteralsEntry
Repeated. Slot literals from the interpretation of the entities. The slot literal provides the exact words used by the user. For example, AMOUNT: One hundred dollars.
slot_confidences UserInput.Interpretation.
SlotConfidencesEntry
Repeated. Slot confidences from the interpretation of the entities.
alternative_interpretations UserInput.Interpretation Repeated. Alternative interpretations possible from the interaction, that is, n-best list.

UserInput.Interpretation.DataEntry

Field Type Description
key string Key of the data.
value string Value of the data.

UserInput.Interpretation.SlotConfidencesEntry

Field Type Description
key string Name of the entity.
value float Value from 0..1 that indicates the confidence of the interpretation for this entity.

UserInput.Interpretation.SlotLiteralsEntry

Field Type Description
key string Name of the entity.
value string Literal value of the entity.

View

Specifies view details for this action.

Field Type Description
id string ID of the view.
name string Name of the view.

Scalar Value Types

.proto Type Notes C++ Type Java Type Python Type
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str

Change log

2020-07-22

2020-07-09

2020-06-24

To use this feature:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.

2020-05-28

To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.

2020-05-14

2020-05-13

To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.

2020-05-04

2020-04-30

To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.

2020-04-15

2020-03-31

First release of this new version. See What's new in v1 for details.