What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated calling system

US11495233B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11495233-B2
Application number	US-202117505913-A
Country	US
Kind code	B2
Filing date	Oct 20, 2021
Priority date	Sep 24, 2019
Publication date	Nov 8, 2022
Grant date	Nov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a server, and from a client device of a user, a request to perform a task, wherein the task includes causing a bot hosted at the server to initiate a telephone call with an entity; initiating, by the server, the telephone call with the entity to perform the task; and based on a telephone conversation conducted during the telephone call: receiving, by the server, audio data that captures a spoken utterance provided by a human representative associated with the entity; determining, by the server, a user intent of a first previous portion of the telephone conversation associated with the human representative and a bot intent of a second previous portion of the telephone conversation associated with the bot, wherein the first previous portion of the telephone conversation occurred prior to receiving the audio data of the utterance, and wherein the second previous portion of the telephone conversation also occurred prior to receiving the audio data of the utterance; generating, by the server, and based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent, synthesized speech capturing a reply to the spoken utterance; and causing, by the server, the synthesized speech to be provided for audible presentation to the human representative. 2. The method of claim 1 , further comprising: based on the telephone conversation conducted during the telephone call: determining, by the server, a context of the telephone conversation, wherein generating the synthesized speech capturing the reply to the spoken utterance is further based on the context of the telephone conversation. 3. The method of claim 2 , wherein the context of the telephone conversation comprises one or more of: an identity of the entity, a time the telephone call is initiated, an entity location associated with the entity, or a user location associated with the user. 4. The method of claim 1 , further comprising: based on the telephone conversation conducted during the telephone call: determining, by the server, whether the task has been completed; and in response to determining that the task has been completed: causing, by the server, the telephone call with the entity to be terminated. 5. The method of claim 4 , further comprising: in response to determining that the task has not been completed: continuing, by the server, the telephone call with the entity. 6. The method of claim 1 , wherein the server bypasses performance of speech recognition on the spoken utterance. 7. The method of claim 1 , wherein generating the synthesized speech capturing the reply to the spoken utterance based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent comprises: processing, using a machine learning model, at least the audio data that captures the spoken utterance, the user intent, and the bot intent to generate output; determining, based on the output, an additional bot intent associated with the reply to the spoken utterance; and generating, based on the additional bot intent associated with the reply associated with the spoken utterance, the synthesized speech capturing the reply to the spoken utterance. 8. The method of claim 7 , further comprising: processing, using the machine learning model, and along with the audio data that captures the spoken utterance, the user intent, and the bot intent, a context of the telephone conversation to generate the output. 9. The method of claim 7 , wherein the machine learning model is trained based on historical data for previous telephone conversations, and wherein the historical data for the previous telephone conversation comprises, for each previous telephone conversation, at least (i) corresponding previous first speaker intents associated with a first speaker determined based on corresponding first portions of each previous telephone conversation, (ii) corresponding previous second speaker intents associated with a second speaker determined based on corresponding second portions of each previous telephone conversation, (iii) corresponding previous audio data that captures most recent spoken utterance of the first speaker or the second speaker during each previous telephone conversation, and (iv) a corresponding previous intent of a corresponding previous reply to corresponding the most recent spoken utterance. 10. The method of claim 9 , wherein the historical data for the previous telephone conversation further comprises, for each previous telephone conversation, (v) a corresponding previous context for each previous telephone conversation. 11. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to perform operations, the operations comprising: receiving, from a client device of a user, a request to perform a task, wherein the task includes causing a bot hosted at the server to initiate a telephone call with an entity; initiating the telephone call with the entity to perform the task; and based on a telephone conversation conducted during the telephone call: receiving audio data that captures a spoken utterance provided by a human representative associated with the entity; determining a context of the telephone conversation; determining a user intent of a first previous portion of the telephone conversation associated with the human representative and a bot intent of a second previous portion of the telephone conversation associated with the bot, wherein the first previous portion of the telephone conversation occurred prior to receiving the audio data of the utterance, and wherein the second previous portion of the telephone conversation also occurred prior to receiving the audio data of the utterance; generating, based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent, synthesized speech capturing a reply to the spoken utterance; and causing the synthesized speech to be provided for audible presentation to the human representative. 12. The system of claim 11 , the operations further comprising: based on the telephone conversation conducted during the telephone call: determining a context of the telephone conversation, wherein generating the synthesized speech capturing the reply to the spoken utterance is further based on the context of the telephone conversation. 13. The system of claim 12 , wherein the context of the telephone conversation comprises one or more of: an identity of the entity, a time the telephone call is initiated, an entity location associated with the entity, or a user location associated with the user. 14. The system of claim 11 , the operations further comprising: based on the telephone conversation conducted during the telephone call: determining whether the task has been completed; and in response to determining that the task has been completed: causing the telephone call with the entity to be terminated. 15. The system of claim 14 , the operations further comprising: in response to determining that the task has not been completed: continuing the telephone call with the entity. 16. The system of claim 11 , wherein the system bypasses performance of speech recognition on the spoken utterance. 17. The system of claim 11 , wherein generating the synthesized speech capturing the reply to the spoken utterance based on at least the audio data that captures the spoken utterance, the user intent, and the bot inte

Assignees

Google Llc

Inventors

Classifications

H04M3/4936
Speech interaction details (speech recognition per se G10L15/00) · CPC title
G06F40/30
Semantic analysis · CPC title
H04M3/4286
Notifying a held subscriber when his held call is removed from hold · CPC title
H04M1/663
Preventing unauthorised calls to a telephone set · CPC title
G10L15/32
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

View patent family 74882172

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11495233B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions furth…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).