NAV Navbar

Welcome to Mix

Nuance Mix enables you to tackle complex conversational challenges using intuitive DIY tools, backed by Nuance’s industry-leading speech and AI technologies.

Using Mix's tool set, you define the use cases, the concepts (entities) and parameters (values), and the variety of ways consumers will interact with your app or service. This process is called authoring because you are creating (authoring) the speech (ASR), natural language understanding (NLU), and dialog resources needed to power your client applications.

Each of your projects is maintained in one place, the Mix Dashboard, where you can quickly deploy your application in a runtime environment. Your application can then interact with the ASR, NLU, and dialog resources you created using any one of the programming languages supported by the open source gRPC framework. Mix offers four runtime service APIs: ASR, NLU, and Dialog, as well as TTS (text-to-speech).

No matter your role in your organization—whether as a business stakeholder, voice designer, data/speech scientist, developer, or quality assurance tester—Mix helps you craft meaningful, interactive conversations with your users.

Mix fundamentals

Mix is an enterprise-grade software as a service (SaaS) platform that helps you create advanced conversational experiences for your customers.

Whether you engage with customers via a web application, mobile app, interactive voice response (IVR) system, smart speakers, and/or chatbots, the conversation starts with natural language understanding.

Natural language understanding enables customers to make inquiries without being constrained by a fixed set of responses. This conversational experience allows individuals to self-serve and successfully resolve issues while interacting with your system naturally, in their own words.

Before you can design a multichannel conversational interaction (dialog), you need to understand what your customers mean, not just what they say, tap, or type.

NLU

Natural Language Understanding (NLU) is the ability to process what a user says (or taps or types) to figure out how it maps to actions the user intends. The application uses the result from NLU to take the appropriate action.

Ontology

For example, suppose you have developed an application for ordering coffee. You want users of the application to be able to make requests, such as:

Your users have countless ways of expressing their requests. To respond effectively, your application first needs to recognize the users’ actual words. When users type their own words or tap a selection, this step is easy. But when they speak, their voice audio needs to be turned into text by a process called Automatic Speech Recognition, or ASR.

Once your application has collected the words spoken by the user, it then needs to map these words to their underlying meaning, or intention, in a form the application can understand. This process is called Natural Language Understanding, or NLU.

Mix.nlu provides the complete flexibility to build your own natural language processing domain, and the power to continuously refine and evolve your NLU based on real word usage data.

Developing an NLU model

Within the Mix.nlu tool, your main activity is preparing data consisting of sample sentences that are representative of what your users say or do, and that are annotated to show how the sentences map to intended actions.

When annotating a sample sentence, you indicate:

In "NLU-speak," functions are referred to as intents, and parameters are referred to as entities.

The example below shows a sample sentence, and then the same sentence annotated to show its entities:

In the example, the intent of the whole sentence is orderCoffee and the entities within the sentence are [Temperature], [Flavor], and [CoffeeType]. By convention, the names of entities are enclosed in brackets and each entity in the sentence is delimited by a slash [/].

Together, intents and entities define the application (or project's) ontology.

The power of NLU models

In practice, it is impossible to list all of the ways that people order coffee. But given enough annotated representative sample sentences, your model can understand new sentences that it has never encountered before.

In Mix.nlu, this process is called training the NLU model. Once trained, the model is able to interpret the intended meaning of input such as utterances and selections, and provide that information back to the application as structured data in the form of a JSON object. Your application can then parse the data and take the appropriate action.

Iterate! Iterate! Iterate!

Training a model is an iterative process: you start with a small number of samples, train and test a model, add more samples, train and test another version of the model, and continue adding data and retraining the model through many iterations.

Here is a summary of the general process:

Dialog

From an end-user's perspective, a dialog-enabled app or device is one that understands natural language, can respond in kind, and, where appropriate, can extend the conversation by following up the user's input with appropriate questions and suggestions.

In other words, it's a system designed for your customers that integrates relevant, task-specific, and user-specific knowledge and intelligence to enhance the conversational experience.

For example, suppose you've developed an application for ordering coffee. You want your users to be able to make conversational requests such as:

Your dialog model depends on understanding the user's input, and that understanding is passed to the dialog in the form of ASR and NLU models. Mix.nlu is where you define the formal representation of the known entities (or concepts) for the purpose of the dialog and the relationships among them (the ontology), to continue the conversation. For example, by saying:

Once you have a set of known entities from NLU, you're ready to start specifying your dialog. Your dialog will be defined as a set of conditional responses based on what the system has understood from NLU, plus what it knows from other sources that you integrate (for example, from the client device's temperature sensor or from a backend data source).

For example, if the user's known intent is orderCoffee, but the "size" concept is unknown, you might specify that the system:

Mix.dialog offers several response types that can be specified at each turn, including questions for clarifying input, gathering information, confirming information, or guiding the user through a task.

Developing a dialog model

The Mix.dialog tool enables you to design and develop advanced multichannel conversational experiences.

In particular, it allows you to:

Like training an NLU model, building a natural language application is an iterative process. The ability to try out the dialogs you're defining before deploying them is a key part of dialog modeling.

### Developing a dialog model Once you have a set of known entities from NLU, you're ready to start specifying your dialog! Your dialog will be defined as a set of conditional responses based on what the system has understood from NLU, plus what it knows from other sources that you integrate (for example, from the client device's temperature sensor or from a back-end data source). Thus, to design a complete dialog, you need to specify: * The dialog flow * Expected data * The system’s responses #### The dialog 'flow', a.k.a. conditions Mix.dialog uses a decision table approach to determine the flow of the dialog. Each line in the decision table defines a response that the system might take based on a set of conditions around what the system currently knows. For example, if the user's known intent is order_coffee, but the "size" entity is unknown, you might specify that the system should ask the user "Sure, what size?". Or perhaps you could ask the client app to query their database and return to you the user's preferred coffee size, if available, in which case you might specify the system to respond "Sure, your usual venti?" Mix.dialog offers several response types that can be specified at each turn, including questions for clarifying input, gathering information, confirming information, or guiding the user through a task. #### Expected data Dialog responses may be conditioned on data other than that collected by the NLU system (i.e., from what the user has explicitly said). You may also want to take into account information that the client knows about, for example, information about the user's location that may come from the client's GPS data, a user's stored preferences or contacts, or perhaps business-specific information such as their bank account balance or their flight reservations. It all depends on the type of application you're building. These data sources must be specified as part of the dialog, so that the dialog system is able to successfully request the information and use it during the interaction. The points at which the client data is passed to the dialog system, and how frequently, are determined by you via assignment to one of two data types: device context data or query data. * **Device context data** is data that describes the current status of the device and its environment, such as GPS location or device brand/version. In general, this is globally available information (as opposed to intent-specific information from the dialog). Device context data typically consists of strings or numbers and may be dynamic (e.g., GPS location, car speed, battery level) or static (e.g., user name, user gender, device brand/version). It is passed from the client to the dialog at each interaction (i.e., each conversational turn) and therefore should not be used for data with a large footprint (like local songs or a user's address book). * **Query data** is data passed to the dialog after the dialog makes a specific request to the client to retrieve it. During the query, the dialog state is frozen. Queries are often hierarchically structured, and from the dialog perspective, they are an array of values containing 0, 1, or more results. By using a query, the dialog gets the selection feature. For example:

**User**: Text Bob.
*A contact query is made to resolve Bob, and the client passes over 2 results.*
**System**: “Which Bob do you mean? Bob Smith from Metropolis or Robert Brown from Gotham City?”
**User**: “The one from Metropolis.”

In other words, queries are specific to a dialog task. They will get cleared once the task is finished. (Note: they may be kept in dialog history for some turns to resolve [anaphoras](#anaphora-and-ellipsis), but eventually they are wiped and no longer available to the dialog.) #### The system's responses (prompts and other behaviors) In addition to choosing the type of response the system should execute (asking a question, seeking confirmation, etc), you need to define the properties (also called facets) of the response. Common examples are: * Audio (Text To Speech) - what should the system say aloud? * Text - what should the system type back? * Graphic (e.g., emoticon, color theme, animation, etc) - what should the system "persona" display, if anything? * Video - should the system play a video in response? * URL - should the system offer a URL in response? * etc. You can define any response properties that are appropriate, but the most common responses are Audio and Text.

Quick start

This section takes you through the process of creating a simple coffee application. Let's go through some fundamentals first.

Creating applications in Mix involves the following steps:

  1. Create a project. A project contains all the data necessary for building Mix.asr, Mix.nlu, and Mix.dialog resources.
  2. Author ASR, NLU, and dialog resources using the Mix tools. You can also import existing resources from other projects. This is what you'll do in this section.
  3. Create an application configuration, which lets you define the resource version(s) you want to use in your application.
  4. Deploy the application configuration in a runtime environment, so that you can use the resources in your application, with one of the available runtime services:
    • ASR as a Service
    • NLU as a service
    • Dialog as a Service
    • TTS as a Service
  5. Run your client application.

Ready? Let's try it!

This section takes you through the process of creating a simple coffee application, deploying it in the runtime environment, and running a client application that understands the following dialog:

Before you begin

To run this scenario, right-click the links to download the following files:

Try it out!

  1. Create a project using Quick Start on the Mix.dashboard.
  2. In the Dashboard, select the Quick Start project and click the Import/Export tab.
  3. Click the .trsx icon in the Import area and select the order_coffee.trsx file.
  4. Click the .json icon in the Import area and select the order_coffee.json file.
  5. Click the .nlu icon to open the project in Mix.nlu.
  6. Train your Mix.nlu model.
  7. Build the ASR, NLU, and Dialog resources.
  8. Set up a new application configuration.
  9. Deploy your application configuration.
  10. Install and run the Sample Python dialog application.

You can try the app with a few different utterances, specifying a coffee size and coffee type, for example:

This example uses the Dialog as a Service runtime service, but you can also try this project with the following services:

Have questions?

Mix provides a community forum where you can ask questions, find solutions, get information on the latest Mix releases, and share knowledge with other Mix users. You can find the forum here:

https://community.mix.nuance.com/

You can also get to the forum by clicking the forum icon forum icon, available from the Mix window:

forum icon

Other resources

See the following additional topics:

Planning your app

Like any application, a successful multichannel conversational application starts with good user interface design: extensive upfront planning, from identification of business objectives and all the activities users may wish to accomplish, to consideration of design principles for creating dialog strategies, dialog flows, and prompts (called "messages" in Mix).

Five steps: An overview

The main objective of an application is to obtain information from the user to accomplish a goal or series of goals or activities. Here are five steps that should help you plan your client application.

Step 1: Define conversations

Start by making a list of all the activities you imagine a user should be able to perform using your application. Begin to think of these activities as dialogs—conversations between your user and your app with the intent to accomplish a specific goal.

Step 2: Identify items to collect for each dialog

Once you have a list of the activities you plan to enable, define the items to collect from the user to complete each activity. These are the specific items that users want and that your application can offer such as, for the order-coffee function or intent, a large coffee or double espresso. Begin to think of these items to collect as entities.

Step 3: Define intents

After you define the items to collect that complete the various activities or dialogs, make a list of the phrases a user may use to start a particular intent. You may wish to use a formal process to capture sample sentences, either via data collection (historical phrases stored in a database of an existing application or through brainstorming sessions with colleagues) or early usability testing. To import existing data, see Import data.

Step 4: Define entities

Once you’ve listed all the request phrases, you’ll want to start defining the words or phrases that users typically use to express the items to be collected. For example, for the “size” entity, the permitted values might be “small”, “medium”, “large”, and “x-large”.

Include synonyms and alternative words/phrases as well. For example, “smallest”, “normal”, “big”, “biggest you have”. Place yourself in your users’ shoes and think of how they will express what they want. For instance, users who frequent rival coffee shops may use different syntax such “short”, “tall,” “grande”, and “venti”.

Don’t forget the optional words and phrases, such as “decaf” or “hold the caffeine” for the decaffeinated coffee product; “chocolate”, “caramel”, and “vanilla” for flavoring and/or topping; and so on. Depending on your product list, a request simply for “chocolate” might require disambiguation to determine whether the user wants hot chocolate, caffè/café mocha, or, based on the dialog context, chocolate topping.

Once you have defined the phrases for each item to be collected, you’re ready to attach a meaning to each word/phrase. This process consists of assigning the word or phrase an entity and value.

You may want to create a table or chart to capture the relationship between the each word/phrase (literal) and the meaning. Another option is to capture in individual files, one per entity, a newline-separated list of literals and values (meanings). These files can then be imported directly into Mix. See Importing entity literals for the options available.

Sample size.nmlist, detailing literals and associated values:

  small|small
  short|short
  medium|medium
  regular|medium
  tall|medium
  primo|medium
  large|large
  grande|large
  extra-large|extra-large
  venti|extra-large
  massimo|extra-large
  supremo|extra-large

Sample coffeeType.nmlist:

  coffee|drip
  joe|drip
  espresso|espresso
  mocha|mocha
  mocaccino|mocha
Value to collect
(entity)
Literal
(user request)
Meaning
(value)
Size small
short
medium
regular
tall
primo
large
grande
extra large
venti
massimo
supremo
small
small
medium
medium
medium
medium
large
large
extra-large
extra-large
extra-large
extra-large
CoffeeType coffee
joe
espresso
mocha
mocaccino
...
drip
drip
espresso
mocha
mocha
...

Step 5: Create sample conversation flows

Once you’ve defined your entry points (intents) and the associated items to collect (entities), your next step is to create sample conversations for each dialog. The flow of each conversation is called a dialog flow.

Dialog flows are essentially a mapping of interactions between user requests and application prompting. Conversational interactions are complex, since users can say similar things in different ways and may respond in ways that you’ve not anticipated. Dialog flows serve as an early test of your dialog model, to make sure that it flows naturally and to identify plausible responses you might have to consider in the design phase. Mix.nlu and Mix.dialog both provide the means to test (try) the models that you create and to improve upon them before integrating with a client application.

Armed with a list of requirements, goals, and primary use cases, you're ready to start defining the dialog in Mix so that all stakeholders—business owners, speech scientists, VUI designers, developers, and so on—have access to the requirements in Mix.

Defining the dialog

When you define the dialog, you must consider a number of factors before coming up with a strategy or set of strategies, such as:

Dialog design principles

Comprehensive coverage of the entire design process is beyond the scope of this topic. Below is a list of major items to think about, to set you on the right track.

Analyze your application

You did this earlier in the five steps to planning your app. At this point you should have identified:

Map the user interface

Once you’ve decided on general requirements and your approach (tasks to perform, information to collect, information to return to the user, dialog flows), it’s time to sketch or map out how users will interact with the application. In these early stages you might choose to use a diagramming application like Microsoft® Visio to represent in a flow chart each dialog state in the application and use arrows to point to other dialog states based on what the user says. In Mix.dialog you will recreate and streamline this graphical representation and make it available to all stakeholders.

In the future, Mix will provide all the tools necessary to create dialog design documents, from early design flow mockups to detailed specifications, which can be reviewed with stakeholders, revised, and ultimately approved.

In the early stages remember to consider any constraints on information collection. For example, the business requirements of a drink-ordering app may demand that you first ascertain the user’s location before permitting an order to be placed (to ensure that a specific location carries the products requested). Similarly, information dependencies may exist that you need to build into the application; for example, the option to add “shots” may exist for coffees that fit within the “espresso family” but not for regular or “drip coffee”.

A flow chart indicating the information dependencies in your application will give you a visual map of the different branches your application will have to cover, and provide some guidance as to the best strategy to use in designing the dialog.

This sketch or map will help you build the conversational flow in Mix.dialog, with each specific task—such as asking a question, playing a message, and performing recognition—specified via a node. As you add nodes and connect them to one another, the dialog flow takes shape in the form of a graph, allowing you to visualize every piece of the conversational logic.

At to para above, sometime in future? "Nodes can be nested inside reusable components, keeping designs simple and organized." **This information belongs somewhere else. Can we also work in the system overview diagram?** At this point you might want to think about the purpose of each node in your application. For example, the most important nodes you will use in Mix.dialog include: * Start: To start the conversation. Can also be used to set variables and to override global settings such as error and command handling. (For non-Main components these values can be overridden using the Enter node). * Message: To perform non-recognition actions, such as playing a prompt, assigning a variable, or defining the next node in the dialog flow. * Question & Answer: To listen for and recognize user responses. * Intent Mapper: To connect the dialog flow to other components to get the user's intent and collect the information to fulfill that intent. * Data access: To exchange information with a backend system. The following nodes will, in turn, send actions back to your client application at specific points in the dialog: * Message actions to indicate that the client app should play a message to the user * Q&A action to indicate that the app should play a message *and* return user input to the dialog (such as "What type of coffee would you like today" and the user's answer "double espresso") * End actions to indicate the end of the dialog * Data actions to indicate that the dialog expects data to continue the flow (for example, to retrieve the price of a double espresso). For more information on Mix.dialog node types, see [Dialog design elements](../mix-dialog/#dialog-design-elements). For more information on how to return user input to the dialog service, see [Actions](../dialog-grpc/v1/#actions) in the DLGaaS documentation.

Be clear, consistent, and efficient

Design your messages to encourage the simplest and most direct responses possible. Users want speed and efficiency. The fewer the number of steps to complete a task, the greater the perceived efficiency of the system. For more information on directing user responses, see Prompting the user.

Support universal commands such as “main menu”, “escalate”, “goodbye”

Providing the ability to invoke these commands at any point in the conversation gives users control over the dialog. Design the application to allow users to say “main menu” should they miss or forget instructions, “escalate” if they get stuck, and “goodbye” if they wish to leave.

Gracefully handle errors

Errors and misunderstandings are inevitable, just as they are in regular, everyday conversation. Try to anticipate problems and give users effective instructions and feedback to get them back on track smoothly. Common errors include recognition/find-meaning failures such as no-match and no-input conditions. Provide the appropriate level of instruction given the failure condition/error to move the user along in as natural a way as possible. Suggestions are provided in Handling errors.

Confirm but don’t overdo it

Confirmation has its place: for example, for disambiguation, error handling, and when obtaining confirmation before committing a transaction. However, it’s not efficient to confirm each item at a time; unnecessary confirmation can double the length of the interaction and frustrate users. For this reason, it’s best to confirm when a block of information has been completed rather than after each individual piece of information. See Requesting confirmation.

Avoid cognitive overload

Reduce the short-term memory load on users by providing visual as well as auditory (multimodal) feedback, by limiting options whenever possible, and by splitting up complex tasks into a sequence of smaller interactions.

Maintain context

A well-designed application tracks what the user has said (or typed/tapped) and responds in context. Your strategy for maintaining conversational context should take into account factors such as the number of turns or user interactions to retain and when to release the context (for example, if the user is in the middle of a transaction and clicks the Back button or says “cancel”).

Another consideration is intent switching: Do you want to give your users the ability to switch between intents; for example, to move from the place-order dialog to the location dialog (to view a list of nearby coffee shops)? Are users able to switch back with no loss of contextual awareness, or would you prefer that they finish one task at a time? You’ll need to balance the benefits of usability against application complexity.

Remember, you are guiding the user

The best applications focus on the users’ goals and on achieving them in the most efficient and intuitive way possible. The structure of your application will depend on the natural logic of the application (the dialogs or actions to perform and the corresponding information to collect/return), and also on your users’ responses to your questions and on your messages—how you respond not only to successful results but also to errors, ambiguities, and incomplete information.

Prompting the user

Prompts (called "messages" in Mix) permit an application to interact with your users. Not only do they set the tone of the application, they also direct users toward the fulfillment of their goals.

For example, messages enable an application to:

With the help of the sample dialog flows you defined in Planning your app, you should have a good understanding of what types of messages you will need. For example:

Consider how you want to interact with the user. For example, in the form of:

Your dialog design may, after all, support multiple channels (such as IVR/Voice or Digital) and use channel-specific messages as needed. Each channel, in turn, will support multiple modalities (such as rich text, audio, TTS, interactivity, DTMF).

Handling errors

No matter how clear your messages are, a user may still misunderstand, become confused, or encounter an application/system error. When designing your application, you need to communicate clearly, keep the user engaged, avoid ambiguities, and detect and recover from errors.

Your role is to guide your users: give them just enough information to keep them moving toward their goal, in as natural a way as possible. Anticipate errors and gently lead users by reinforcing information and by escalating instructions and feedback as required.

Common failure conditions include recognition/find-meaning failures:

Create special handling for these types of events: consider how many times your app should prompt the user when the failure occurs and what types of messages to use.

For example, for a second no-input result, you might use “Sorry, what was that?” and on the third no-input (user continues to remain silent), provide more detailed instructions or provide examples in case the user is uncertain what to do or say next, such as “Here are some things you can say...” Another option is to suggest that the user try a different action, such as “Sorry, you can try one of these...” As a final fallback strategy use a yes/no question to move the dialog along, such as “Do you still want to place an order? (Just say ‘Yes’ or ‘No’.)”

If many no-input timeouts are observed for a dialog state, consider refining the language. A message may be misleading, incorrect, or insufficient for the user to act on.

Failure scenarios don’t have to be negative experiences—instead, reinforce information about what is expected of the user at that point in the dialog flow and the available options. When appropriate, provide examples. Users pattern their responses on examples.

For system errors, provide a description of the error and suggestions as to how to recover. A recovery suggestion is preferable to cryptic error message and “Please try again”.

Handling ambiguity

You have a certain degree of control over what your users will say or do. Users will react to the views to which they have navigated and to the buttons they see displayed in your application’s interface. Their reactions are also influenced by their past experiences and expectations when dealing with your business.

Sometimes, however, a user’s request is too vague or general and a single interpretation cannot be made (meaning cannot be determined). In these cases, the application must prompt the user to fill in the missing information. For example, you might:

When multiple, non-conclusive interpretations are identified and/or the user rejects the application’s response, you might use the interpretation results returned to use the confidence score to narrow down choices. Typically, applications are most interested in results with the highest score. However, you can use confidence scores to determine whether:

Requesting confirmation

Confirmation is the act of requesting that the user accept or reject the application’s understanding of one or more utterances. Confirmation is useful and necessary when:

Role of confirmation in the dialog

In general, confirmation plays three key roles in dialog design. Confirmation ensures that:

All of these factors influence the user’s confidence in the application and the business.

Confirmation strategies

In general, it is not efficient to confirm each item at a time. Unnecessary confirmation can double the length of the interaction and frustrate users. Instead, consider confirming when a block of information—ideally, a group of related items—has been collected rather than after each individual piece of information.

You will notice in the second, more natural-sounding example that the application uses both implicit and explicit confirmation:

Implicit confirmation acknowledges the user’s choice and moves the conversation along faster. Although the user has an opportunity at any time to correct the application, the user may not notice the error or feel unsure how (or reluctant) to correct it. For this reason, use implicit confirmation when the confidence level is high and when the potential consequence of an error in understanding is low.

Conversely, use explicit confirmation when:

Explicit confirmation instills confidence: gives users the chance to say “Yes” or “No” (or to make a forced choice decision) and helps prevent serious problems from occurring (such as an unintended purchase) due to false accepts.

Interacting with data

You may need to exchange data with an external system in your dialog application. For example, you may want to take into account information about the user's location from the client's GPS data, a user's stored preferences or contacts, or business-specific information such as the user's bank account balance or flight reservations—it all depends on the type of application you're building.

For example, consider these use cases:

Exchanging data is done through a data access node in Mix. This node tells the client application that the dialog expects data to continue the flow. It can also be used to exchange information between the client app and the dialog.

Data access nodes allow you to exchange information by:

You can also send data (variables or entities) to the client application from a Mix.dialog question and answer node. See Set up a question and answer node.

Mix.dialog also gives you the ability to handle interactions with external systems using the external actions node. For example, to:

## Using dynamic values Dynamic values provide vocabularies specific to a user during the application session. For example, a banking application might wish to load a list of payees for a bill-payment feature, and make that list specific to each user. After identifying a specific user, an application would retrieve that user’s account information and then build a user-specific vocabulary based on that information. For example, for a pay-bill app to include dynamic entity values such as the names of the bills (payees) that the user might wish to pay, as well as his/her account balances. For more information, see [Dynamic list entity](../mix-nlu/#dynamic-list-entities) and [Mark a custom entity as dynamic](../mix-dialog/#mark-a-custom-entity-as-dynamic). **XMIX-58 No longer for GA?** **OTHER NOTES** **Can we work in standard default variables (XMIX-597, now deferred) and explain why needed, for example, caller ANI/DNIS, current time, or access to most recently identified intent? *Please* feel free to provide examples!** **Can we include an example for XMIX-550 (checking if a variable or entity is null to determine next action)?**