Automated calling system

US11495233B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11495233-B2
Application numberUS-202117505913-A
CountryUS
Kind codeB2
Filing dateOct 20, 2021
Priority dateSep 24, 2019
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a server, and from a client device of a user, a request to perform a task, wherein the task includes causing a bot hosted at the server to initiate a telephone call with an entity; initiating, by the server, the telephone call with the entity to perform the task; and based on a telephone conversation conducted during the telephone call: receiving, by the server, audio data that captures a spoken utterance provided by a human representative associated with the entity; determining, by the server, a user intent of a first previous portion of the telephone conversation associated with the human representative and a bot intent of a second previous portion of the telephone conversation associated with the bot, wherein the first previous portion of the telephone conversation occurred prior to receiving the audio data of the utterance, and wherein the second previous portion of the telephone conversation also occurred prior to receiving the audio data of the utterance; generating, by the server, and based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent, synthesized speech capturing a reply to the spoken utterance; and causing, by the server, the synthesized speech to be provided for audible presentation to the human representative. 2. The method of claim 1 , further comprising: based on the telephone conversation conducted during the telephone call: determining, by the server, a context of the telephone conversation, wherein generating the synthesized speech capturing the reply to the spoken utterance is further based on the context of the telephone conversation. 3. The method of claim 2 , wherein the context of the telephone conversation comprises one or more of: an identity of the entity, a time the telephone call is initiated, an entity location associated with the entity, or a user location associated with the user. 4. The method of claim 1 , further comprising: based on the telephone conversation conducted during the telephone call: determining, by the server, whether the task has been completed; and in response to determining that the task has been completed: causing, by the server, the telephone call with the entity to be terminated. 5. The method of claim 4 , further comprising: in response to determining that the task has not been completed: continuing, by the server, the telephone call with the entity. 6. The method of claim 1 , wherein the server bypasses performance of speech recognition on the spoken utterance. 7. The method of claim 1 , wherein generating the synthesized speech capturing the reply to the spoken utterance based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent comprises: processing, using a machine learning model, at least the audio data that captures the spoken utterance, the user intent, and the bot intent to generate output; determining, based on the output, an additional bot intent associated with the reply to the spoken utterance; and generating, based on the additional bot intent associated with the reply associated with the spoken utterance, the synthesized speech capturing the reply to the spoken utterance. 8. The method of claim 7 , further comprising: processing, using the machine learning model, and along with the audio data that captures the spoken utterance, the user intent, and the bot intent, a context of the telephone conversation to generate the output. 9. The method of claim 7 , wherein the machine learning model is trained based on historical data for previous telephone conversations, and wherein the historical data for the previous telephone conversation comprises, for each previous telephone conversation, at least (i) corresponding previous first speaker intents associated with a first speaker determined based on corresponding first portions of each previous telephone conversation, (ii) corresponding previous second speaker intents associated with a second speaker determined based on corresponding second portions of each previous telephone conversation, (iii) corresponding previous audio data that captures most recent spoken utterance of the first speaker or the second speaker during each previous telephone conversation, and (iv) a corresponding previous intent of a corresponding previous reply to corresponding the most recent spoken utterance. 10. The method of claim 9 , wherein the historical data for the previous telephone conversation further comprises, for each previous telephone conversation, (v) a corresponding previous context for each previous telephone conversation. 11. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to perform operations, the operations comprising: receiving, from a client device of a user, a request to perform a task, wherein the task includes causing a bot hosted at the server to initiate a telephone call with an entity; initiating the telephone call with the entity to perform the task; and based on a telephone conversation conducted during the telephone call: receiving audio data that captures a spoken utterance provided by a human representative associated with the entity; determining a context of the telephone conversation; determining a user intent of a first previous portion of the telephone conversation associated with the human representative and a bot intent of a second previous portion of the telephone conversation associated with the bot, wherein the first previous portion of the telephone conversation occurred prior to receiving the audio data of the utterance, and wherein the second previous portion of the telephone conversation also occurred prior to receiving the audio data of the utterance; generating, based on at least the audio data that captures the spoken utterance, the user intent, and the bot intent, synthesized speech capturing a reply to the spoken utterance; and causing the synthesized speech to be provided for audible presentation to the human representative. 12. The system of claim 11 , the operations further comprising: based on the telephone conversation conducted during the telephone call: determining a context of the telephone conversation, wherein generating the synthesized speech capturing the reply to the spoken utterance is further based on the context of the telephone conversation. 13. The system of claim 12 , wherein the context of the telephone conversation comprises one or more of: an identity of the entity, a time the telephone call is initiated, an entity location associated with the entity, or a user location associated with the user. 14. The system of claim 11 , the operations further comprising: based on the telephone conversation conducted during the telephone call: determining whether the task has been completed; and in response to determining that the task has been completed: causing the telephone call with the entity to be terminated. 15. The system of claim 14 , the operations further comprising: in response to determining that the task has not been completed: continuing the telephone call with the entity. 16. The system of claim 11 , wherein the system bypasses performance of speech recognition on the spoken utterance. 17. The system of claim 11 , wherein generating the synthesized speech capturing the reply to the spoken utterance based on at least the audio data that captures the spoken utterance, the user intent, and the bot inte

Assignees

Inventors

Classifications

  • Speech interaction details (speech recognition per se G10L15/00) · CPC title

  • Semantic analysis · CPC title

  • Notifying a held subscriber when his held call is removed from hold · CPC title

  • Preventing unauthorised calls to a telephone set · CPC title

  • Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11495233B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions furth…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).