What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

End-of-turn detection in spoken dialogues

US10957320B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10957320-B2
Application number	US-201916257566-A
Country	US
Kind code	B2
Filing date	Jan 25, 2019
Priority date	Jan 25, 2019
Publication date	Mar 23, 2021
Grant date	Mar 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory that stores computer executable components; a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise: a speech receiving component that receives a spoken dialogue from a first entity; and a speech processing component that employs a neural network that concurrently processes a first classifier and a second classifier using acoustic cues from the spoken dialogue to predict a source of a subsequent spoken dialogue, wherein; the first classifier generates a first prediction of an intention of the spoken dialogue, the second classifier generates a second prediction of a type of turn of the spoken dialogue, and the neural network combines the first prediction and the second prediction using a minimizing joint loss function to predict whether the source of the subsequent spoken dialogue will be the first entity or another entity. 2. The system of claim 1 , wherein the neural network is a multi-task neural network, and wherein the system further comprises a network optimizing component that optimizes the multi-task neural network by employing a plurality of speech labels to predict the source of the subsequent spoken dialogue. 3. The system of claim 2 , wherein the plurality of speech labels comprises an optimizing data set. 4. The system of claim 1 , wherein the minimizing joint loss function comprises a first loss function for the first prediction and a second loss function for the second prediction. 5. The system of claim 1 , wherein the speech processing component predicts the source of the subsequent spoken dialogue in real time during a communication session comprising the spoken dialogue. 6. The system of claim 1 , wherein the type of turn is selected from a group consisting of a turn hold, a turn switch, a smooth switch, and an overlapping switch. 7. The system of claim 1 , wherein the acoustic cues comprise timing of the spoken dialogue. 8. The system of claim 1 , wherein the acoustic cues comprise a cue selected from the group consisting of intonation, pitch change, speaking rate, and pause. 9. The system of claim 1 , wherein the other entity is a computerized spoken dialog system. 10. A computer-implemented method, comprising: receiving, by a system operatively coupled to a processor, a spoken dialogue from a first entity; and predicting, by the system, a source of a subsequent spoken dialogue by employing a neural network that concurrently processes a first classifier and a second classifier using acoustic cues from the spoken dialogue, wherein: the first classifier generates a first prediction of an intention of the spoken dialogue, the second classifier generates a second prediction of a type of turn of the spoken dialogue, and the neural network combines the first prediction and the second prediction using a minimizing joint loss function to predict whether the source of the subsequent spoken dialogue will be the first entity or another entity. 11. The computer-implemented method of claim 10 , wherein the neural network is a multi-task neural network, and wherein the computer-implemented method further comprises optimizing, by the system, the multi-task neural network by employing a plurality of speech labels to predict the source of the subsequent spoken dialogue. 12. The computer-implemented method of claim 11 , wherein the plurality of speech labels comprises an optimizing data set. 13. The computer-implemented method of claim 10 , wherein the minimizing joint loss function comprises a first loss function for the first prediction and a second loss function for the second prediction. 14. The computer-implemented method of claim 10 , wherein the predicting the source of the subsequent spoken dialogue occurs in real time during a communication session comprising the spoken dialogue. 15. The computer-implemented method of claim 10 , wherein the type of turn is selected from a group consisting of a turn hold, a turn switch, a smooth switch, and an overlapping switch. 16. The computer-implemented method of claim 10 , wherein the acoustic cues comprise timing of the spoken dialogue. 17. The computer-implemented method of claim 10 , wherein the acoustic cues comprise a cue selected from the group consisting of intonation, pitch change, speaking rate, and pause. 18. The computer-implemented method of claim 10 , wherein the other entity is a computerized spoken dialog system. 19. A computer program product facilitating predicting a source of a subsequent spoken dialogue, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: receive, by the processor, a spoken dialogue from a first entity; and predict, by the processor, the source of the subsequent spoken dialogue by employing a neural network that concurrently processes a first classifier and a second classifier using acoustic cues from the spoken dialogue, wherein: the first classifier generates a first prediction of an intention of the spoken dialogue, the second classifier generates a second prediction of a type of turn of the spoken dialogue, and the neural network combines the first prediction and the second prediction using a minimizing joint loss function to predict whether the source of the subsequent spoken dialogue will be the first entity or another entity. 20. The computer program product of claim 19 , wherein the neural network is a multi-task neural network, and wherein the program instructions are further executable by the processor to cause the processor to optimize, by the processor, the multi-task neural network by employing a plurality of speech labels to predict the source of the subsequent spoken dialogue.

Assignees

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G10L15/32
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
G10L2015/225
Feedback of the input speech · CPC title

Patent family

Related publications grouped by family.

View patent family 71732608

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10957320B2 cover?: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a spe…
Who is the assignee on this patent?: IBM, Univ Michigan Regents
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).