Intonation: The Next Frontier for Voice Technology

By Teo Borschberg, CEO of OTO.ai

“Thanks a lot.” For a simple phrase, these three little words can vary widely in their meaning. Perhaps the speaker is genuinely thankful for what they have received. But on the other end of the spectrum, the speaker could be sarcastic, and truly frustrated with whatever they have been given. 

What is the difference between these two seemingly identical phrases? The answer is intonation.

Measuring intonation is a task that we are all familiar with, whether we realize it or not. Calling a support line, we have heard the recording, “This call may be monitored for quality assurance.” But what does that mean, and how is this seemingly subtle practice of monitoring calls impact customer service?

For years, “monitored for quality assurance” has meant that a supervisor in a call center—where thousands of calls per day from customers might come flooding in—is choosing recorded calls at random to listen to. This supervisor is tasked with figuring out what kinds of responses sit well with customers, and what types of phrasing can turn conversations into sales or otherwise successful engagements.

But call-center supervisors—who manage phone banks on behalf of large corporations—have never really been able to study customers before. They’ve had to rely on their own ear as they try to determine whether a caller really means what he or she is saying.

The customer’s intention—which can often be determined through the tone of their voice—has been a black box. And while most speech technology companies are wrapped up in converting voice to text,  a speaker’s state of mind or real intention is generally overlooked. 

This fundamental issue of glossing over the humanness in voice was exactly the motivation of founding OTO. We use artificial intelligence to discern tones of voice, and we are building a universal standard of satisfaction that can be applied to any conversation: no matter the language or the types of words used. Our focus is not on the literal meaning of the conversation, but the layer beneath: how the customer feels and behaves during the interaction.

What We’re Accomplishing 

The human ear is pretty good at detecting sarcasm, boredom, resignation and other shades of feeling. But the human ear doesn’t scale – at least not well enough to listen to thousands upon thousands of recorded calls, and then draw correlations between intonation and eventual business outcomes.

This is what we’re doing, using technology that came from the same SRI International laboratory that produced Siri and other voice technologies.

Since most other voice-tech companies are concerned with what people are saying, we saw an opportunity to explore how people say things, and delve into why that matters for a business.

Maybe it’s because I speak five languages, or maybe it’s because I’ve always been fascinated with human interaction in general. But I believe there’s a wealth of material we can learn from in the intonation with which people speak. Albert Mehrabian, Professor Emeritus of Psychology at the University of California, Los Angeles, has attempted to quantify that concept. After two separate studies in the 1970s, he concluded that only 7% of what we communicate consists of the literal content of the message. The use of one’s voice, such as tone, intonation and volume, take up 38%, and as much as 55% of communication consists of body language.

The challenge in building a company around this conviction has been to show customers, investors, potential employees and other stakeholders that this rather soft skill can—and should—be approached like a science.


Proving Our Model


To analyze voice intonation, OTO uses Acoustic Language Processing (ALP), which is a high tech derivative of Natural Language Processing. Modeling vocal intonation through our DeepTone™ engine enables customers to track 100% of voice-based customer service interactions and accurately predict behaviors like satisfaction scores (NPS predictor) or intent to purchase for every customer. Here is when our previous example comes in handy. The ALP gives further insight into the phrase “Thanks a lot.” Was the customer genuinely thankful? Or is there still some degree of satisfaction that the company needs to address?


This process provides companies with the granularity they need to more effectively diagnose any problems and take actions to reduce churn and increase customer satisfaction. Many businesses–including telecommunications carriers, cable providers, insurance companies, energy companies —rely on long-term customer relationships, and churn is costly. Intonation gives us a deeper and clearer understanding of the customer’s satisfaction in order to reduce customer turnover.


OTOs  first customer is a prime example of how understanding the humanness in technology can create a successful business environment. This customer operates a call center that fields calls for organizations that take charitable donations. This is a company that takes millions of inbound calls each year, and tries to convert each one into a contribution to a charity. It’s also a company looking to clearly express gratitude and improve its overall conversion rate. 

This was an opportunity for us to demonstrate product value, grow our data set and gain real-world experience.The charitable organization was an interesting customer to begin with for OTO because not only were we able to understand how customers were responding to solications, we were also able to monitor the intonation of those making the calls affected the results. If the employee sounded disengaged or low energy, we were able to track that and provide real-time feedback on how the tone of voice can drive sales.

If you are an enterprise tech company looking to scale and prosper today, that’s the customer acquisition model you should follow. Tackle valuable verticals that are underserved by horizontal incumbent vendors, and demonstrate value by solving real-world  problems for your customer.

We accomplished this with our first customer, and that’s why we have many customers today.

Building an Enterprise Tech Company Today

Enterprise technology is an area that can be tricky for startups, because tech giants like Salesforce, Amazon, Microsoft, Google and others have built horizontal software platforms that have sought to be all things to all people, or at least to all businesses.

This is why startups should look to disrupt vertical markets, and—as our lead investor likes to say—innovate “near the customer.” It takes more than a great technology to win over customers, it takes a full-stack solution. It takes boots on the ground. This approach is what’s required for success in the era of Enterprise 4.0.

What OTO has to make clear is that we are implementing newer technology and artificial intelligence not to take over the jobs that humans are currently doing, but instead to boost efficiency and effectiveness in these jobs. Humans will always be a critical part of our economic system, and things like tone of voice illustrate just how much of a priority understanding human behavior is: especially in the world of AI. As we work on a human level, we are creating a foundation of trust that not only benefits our clients and their customers, but the technology that makes our jobs run as smoothly as possible.

Customers today expect purpose-built technologies that make use of embedded AI and leverage unique datasets to offer insights that will help the business. That’s what OTO and other Enterprise 4.0 companies are doing.

An important component in keeping OTO going strong is keeping our AI engine evolving constantly. As we add new customers, we add additional languages and millions more calls. For this reason, our AI program must adapt and grow over time.

We make this happen by building a global company at the outset.  We began by scouting for talent outside the US, and working with a distributed team. We have an exceptional AI team in Switzerland, and team members in other countries as well, working to add more languages to our platform, and gain more insight about how intonation affects customer service and the economics of satisfaction.

The longer we do business, the more data our technology must contend with. We’ve built our acoustic modeling technology to be language-agnostic, so we don’t have to reinvent the wheel any time we acquire a customer in another country.  We are building a global enterprise technology company and need to be able to support the needs of our customers across the world.

Our Future

 We have a brilliant international team, and we have an impressive roster of customers. We’re analyzing millions of phone calls today in Spanish, French, English, German and Portuguese- and that is only the beginning.

We had to prove ourselves in one use case, with one customer, in order to unlock the viable business we have today. We were able to show that analyzing vocal tones increased conversions for this customer by up to 18% percent.

The era of total domination by horizontal players has ended, and the next era—where nimble startups can serve the needs of sophisticated customers—has begun.

In the future, OTO will analyze customer calls in dozens of languages, and convert larger numbers of calls into sales conversions and satisfied customers. We’re on that road now, and the first step was to prove our value to a single customer. 

And as we begin to integrate this technology for more companies around the world and change their economics of satisfaction, we hope to hear them say–with the intonation that suggests they really mean it–“Thanks a lot.”