There are many cloud integration companies that tackle API orchestration. They range from complex enterprise solutions to less-complicated, consumer-focused ones. At Intwixt, we focus on integrating people and APIs interchangeably. Conversing with people and calling APIs share overlapping qualities. From an engineering perspective, inputs, outputs, and errors are common patterns to each. But people are particularly challenging when you’re required to converse on their terms (when they take the reins, so to speak).
The system must be able to retain state and resume the conversation when the person responds back. And it must be able to react to unexpected input and redirect the conversation as appropriate. This is especially true of voice-activated assistants like Alexa and Google Home and messaging options like SMS. It’s difficult to constrain input on these platforms and protocols, so flexibility is key.
As humans, we’re intuitively good at conversation, so it’s easy to overlook the challenges associated with modeling them. I’ll use the remainder of this document to introduce how we model conversational use cases using the Intwixt platform. I’ll start simply with one-way notifications (such as getting an SMS to confirm an appointment). I’ll follow with relays and guided conversations (modern phone trees) and conclude with bi-directional conversations (where the system reacts to unexpected input from the person).
Importantly, I’d like to show how traditional integration principles can be used to model the unexpected ways in which people converse. For these examples, I’ll use Twilio, a provider of telephony services like voice and SMS. Twilio provides RESTful APIs that let you programmatically exchange text messages.
The most common use of Twilio is the simplest, namely to send a one-way notification. If you need to alert a customer of some event (like a change of venue or confirmation of an appointment), call the Twilio Send TEXT Message API.
Twilio can also be used to process incoming text messages to programmatically trigger an event or simply record the message. Here is a one-step flow that saves incoming messages to a Google Sheets worksheet.
Traditional integration platforms easily model notification use cases, because even though they involve telephony and people, they’re not conversational. They represent traditional RESTful API calls and Webhooks.
A relay brokers messages between two parties. It can be modeled using a Twilio receiver and a Twilio sender. In the following flow, the Twilio receiver receives a text, the system looks up the intended recipient, and then the message is forwarded to them using a Twilio sender.
A conversation is actually taking place, but because both parties are human, there is no need to manage state nor parse the message content for meaning. It’s a simple and direct use case, but it serves as the starting point for understanding conversation, because once we replace one of the people in the interaction, we’re forced to model things in their stead.
The fundamental unit of design we use when modeling conversations is a subtype we call a Guided Conversation. Instead of waiting for the person to say something random and then reacting, the flow is modeled to control the interaction and limit user choice. It has many of the qualities of traditional visual UIs in that it helps the user make a decision by limiting their options and communicating prompts in advance.
With guided conversations, control is maintained in one of two ways. The first is by initiating contact with the person (such as sending an SMS). This is an effective way to let the user understand our intent while communicating their limited response options.
"Can I get your feedback on something? y/n"
The second is to wrest control in a moment of vagary, such as when the user says a phrase that is unexpected or when it is they who initiated the conversation. In such a moment, the flow will respond with instructions for what to do next.
"I support the following options. Please choose one."
With guided conversations, the flow is designed to guide the person each step of the way. It’s not artificially intelligent by any stretch, but it’s effective at gathering information and communicating options to users. It’s worth considering the next time you need to survey a customer, register or onboard, or implement tree-based decision-making and navigation over a communication channel such as SMS.
It’s possible to design guided conversations using traditional integration platforms and tools, but they represent the first real hurdle traditional tools face as they introduce the need to manage state and unpredictability. How long should the system retain state information; how should the system react to unexpected input; and when should the system release state to free up resources?
Guided Conversation | The Sender
In a traditional integration platform, modeling a long-running process like a guided conversation requires that the flow track state. Upstream context relevant to the transaction when it resumes must be saved and sufficient information must be stored to collate the response with the question. The flow must also spawn a timeout thread to invoke cleanup operations and free-up unneeded state if the user doesn’t respond. (Things are already getting complicated, and we haven’t yet modeled the receiver flow that will handle the user’s response.)
Guided Conversation | The Receiver
The flow that handles the user’s response includes similar additions (and complications) to handle human variability. There’s an activity to cancel the timeout, one to free unused memory (state), and another to validate the user’s response. And there’s an additional path to re-ask the question in cases where the user’s response is incorrect.
Taken together, the sender and receiver flows begin to reveal the additional capabilities common to communication use cases. These include:
- State: What is the state of the conversation including the upstream/antecedent context and any information necessary to collate the response and resume operation when the user responds
- Timeout: How much time does the user have to respond. Will the system re-ask or cancel if they don’t respond in time
- Intent: What did the user say when they responded and is there any historical context to further constrain its meaning
- Redirection: If the user’s response was unexpected, should the system re-ask the question or reset the context per the response content
Guided Conversation | Unified View
Because state, timeout, intent, and redirection are fundamental to conversational semantics, they’ve been abstracted as configurable settings in the Intwixt platform. This greatly simplifies conversational use cases as the developer is no longer responsible for modeling asynchronous complexity. If there’s upstream state to cache or free-up, the system handles it automatically. Unexpected events are thrown like traditional system errors, halting execution and allowing for alternate paths of recovery.
The designed flows now appear like synchronous integration flows with triggers, activities, and alternate/error paths. From the developer’s perspective, the complexity of asynchronous interaction fades away and the orchestration effort is no more complex than coordinating traditional RESTful APIs.
Importantly, factoring out the conversation semantics has now simplified the guided conversation sufficiently that it can be used as a building block to support true bi-directional conversation, where the user can redirect the flow of the conversation. User-initiated redirection isn’t just handled, it’s expected and supported as a first-class concept in the system.
A key tenet of bi-directional conversation is that either party can steer the direction of the conversation. Suppose, for example, that we are collecting information over SMS and request the user’s email address. They respond,
"Will you share my information"
The user intends to change direction, but they don’t necessarily want to cancel the transaction. The system must be able to react to the user’s statement, and if necessary, resume the prior interaction. To support this use case, I need to model two additional flow types. In total, I now have three types--all necessary to model the complexities of a bi-directional conversation.
The three components of Bi-directional Conversation
- Guided Conversation | guides a conversation by constraining user choice (already explained in the prior section)
- Intent Resolver | resolves the intent of ambiguous user statements, by using a Natural Language Processing (NLP) engine or similar resolution engine such as API.ai
- Router | routes the conversation to the the appropriate Guided Conversation once the user’s intent is understood by the Intent Resolver
Tip: Using a proper NLP engine is critically important to orchestrating conversations. Intent resolution is a very different construct from process modeling. It’s good to leverage tools that specialize in NLP to parse intent and then use this to augment the orchestration of the flow. Neither notational system is adequate alone to fully model the qualities of human conversation.
Seen from this perspective, bi-directional conversations can be understood as a collection of guided conversations. They’re orchestrated by a router and aided by an intent resolver. And because the system is handling the complexity of state, intent, timeout, and redirection, it’s possible to construct the conversation without cumbersome modeling requirements. It’s possible to keep the flows at the level of business and model the use case at hand as if modeling traditional synchronous activities.
Modeling the complexities of human conversation cannot be distilled in just a few pages. There are many ways to model conversations, even within Intwixt. Best practices (like keeping the conversation shallow) and complex use cases that include contextual language parsing and evolving ontologies are specialized enough to warrant their own write-ups.
If you’d like to see a working example of these more-complex principles, we created a Messenger chatbot that helps users create and manage shared lists like shopping lists and todo lists. It leverages API.ai for natural language processing and is capable of evolving its ontology at a per-user level, depending upon the items each user adds to their lists. It applies all of the conversational principles outlined here and includes many more advanced topics such as Random Access Navigation.
I cloned a working copy of the bot. You can login at https://my.intwixt.com. Use the following credentials to access my demo account to see how we model the complexities of dynamic conversation:
The demo includes a live Messenger bot to allow you to better understand the flow of information. You can interact with this bot at the following URL: https://www.messenger.com/t/1903807953181669. This is a fully public account, so anything you say to the bot will be logged and is visible to anyone viewing the flows in the bot designer.