Speech recognition device, speech recognition method, and recording medium

US11315572B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11315572-B2
Application numberUS-202016826899-A
CountryUS
Kind codeB2
Filing dateMar 23, 2020
Priority dateMar 27, 2019
Publication dateApr 26, 2022
Grant dateApr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition device includes: an obtaining unit which obtains a speech uttered in a conversation between a first speaker and a second speaker; a storage which stores the speech obtained; an input unit which receives operation input; an utterance start detector which, when the input unit receives the operation input, detects a start position of the speech; and a speaker identification unit which identifies a speaker of the speech as the first speaker who has performed the operation input or the second speaker who has not performed the operation input, based on (i) first timing at which the input unit has received the operation input and (ii) second timing indicating the detected start position of the speech. The first and second timing are set for each speech of the first and second speakers. A speech recognizer performs speech recognition on the speech whose speaker has been identified.

First claim

Opening claim text (preview).

The invention claimed is: 1. A speech recognition device for a conversation between a first speaker and at least one second speaker who is a conversation partner of the first speaker, the speech recognition device comprising: an obtaining unit which obtains a speech uttered in the conversation between the first speaker and the at least one second speaker; a storage which stores the speech uttered in the conversation between the first speaker and the at least one second speaker and obtained by the obtaining unit; an input unit which receives operation input from at least the first speaker, the operation input serving as a trigger to perform speech recognition on each speech uttered in the conversation between the first speaker and the at least one second speaker; an utterance start detector which, in response to the operation input received by the input unit, detects a start position of the speech stored in the storage, the start position being a position at which utterance of the speech has started; and a speaker identification unit which identifies a speaker of the speech as one of the first speaker who has performed the operation input on the input unit and the at least one second speaker who has not performed the operation input on the input unit, based on first timing and second timing which are set for each of speeches uttered in the conversation between the first speaker and the at least one second speaker, the first timing being timing at which the input unit has received the operation input, the second timing being timing which indicates the start position of the speech detected by the utterance start detector, wherein speech recognition is performed on the speech uttered by the one of the first speaker and the at least one second speaker identified by the speaker identification unit, the speech recognition being performed by a speech recognizer from the start position of the speech, and the speaker identification unit: compares the first timing and the second timing which are set for each speech uttered in the conversation between the first speaker and the at least one second speaker; identifies the speaker of the speech as the first speaker from the first speaker and the at least one second speaker when the first timing is earlier than the second timing; and identifies the speaker of the speech as the at least one second speaker from the first speaker and the at least one second speaker when the second timing is earlier than the first timing. 2. The speech recognition device according to claim 1 , wherein when the speaker of the speech is identified as the first speaker from the first speaker and the at least one second speaker, the speech recognizer performs the speech recognition on the speech of the first speaker, and when the speaker of the speech is identified as a second speaker from the first speaker and the at least one second speaker, the speech recognizer performs the speech recognition on the speech of the second speaker. 3. The speech recognition device according to claim 1 , wherein the speaker identification unit identifies the speaker as one of the first speaker and the at least one second speaker, for each speech uttered in the conversation between the first speaker and the at least one second speaker in a specified period before or after the first timing at which the input unit has received the operation input. 4. The speech recognition device according to claim 1 , wherein upon finish of the speech recognition on a speech of the first speaker who has performed the operation input on the input unit, the storage starts to store a speech obtained by the obtaining unit, to store a speech of the at least one second speaker. 5. The speech recognition device according to claim 1 , comprising: a communication unit configured to communicate with a cloud server which includes the speech recognizer, wherein the communication unit transmits, to the cloud server, the speech of the one of the first speaker and the at least one second speaker identified by the speaker identification unit, and receives a result of the speech recognition that the speech recognizer included in the cloud server has performed on the speech from the start position of the speech. 6. The speech recognition device according to claim 1 , comprising: the speech recognizer which performs the speech recognition on the speech of the one of the first speaker and the at least one second speaker identified by the speaker identification unit, the speech recognition being performed from the start position of the speech. 7. The speech recognition device according to claim 1 , wherein the input unit is one operation button provided to the speech recognition device. 8. The speech recognition device according to claim 1 , wherein the input unit receives the operation input from the first speaker for every speech of the first speaker and for every speech of the at least one second speaker. 9. A speech recognition method for a conversation between a first speaker and at least one second speaker who is a conversation partner of the first speaker, the speech recognition method comprising: obtaining, using an obtaining unit, a speech uttered in the conversation between the first speaker and the at least one second speaker; storing, in a storage, the speech uttered in the conversation between the first speaker and the at least one second speaker and obtained; receiving, by an input unit, operation input from at least the first speaker, the operation input serving as a trigger to perform speech recognition on each speech uttered in the conversation between the first speaker and the at least one second speaker; detecting, using an utterance start detector and in response to the operation input received by the input unit, a start position the speech stored in the storage, the start position being a position at which utterance of the speech has started; identifying, using a speaker identification unit, a speaker of the speech as one of the first speaker who has performed the operation input on the input unit and the at least one second speaker who has not performed the operation input on the input unit, based on first timing and second timing which are set for each of speeches uttered in the conversation between the first speaker and the at least one second speaker, the first timing being timing at which the input unit has received the operation input, the second timing being timing which indicates the start position of the speech detected; and performing, using a speech recognizer, speech recognition on the speech of the one of the first speaker and the at least one second speaker identified, the speech recognition being performed from the start position of the speech, and in the speaker identification unit: comparing the first timing and the second timing which are set for each speech uttered in the conversation between the first speaker and the at least one second speaker; identifying the speaker of the speech as the first speaker from the first speaker and the at least one second speaker when the first timing is earlier than the second timing; and identifying the speaker of the speech as the at least one second speaker from the first speaker and the at least one second speaker when the second timing is earlier than the first timing. 10. A non-transitory computer-readable recording medium for use in a computer, the recording medium having a computer program recorded thereon for causing the computer to execute the speech recognition method according to claim 9 .

Assignees

Inventors

Classifications

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements (G01S5/28 takes precedence) · CPC title

  • Announcement of recognition results · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11315572B2 cover?
A speech recognition device includes: an obtaining unit which obtains a speech uttered in a conversation between a first speaker and a second speaker; a storage which stores the speech obtained; an input unit which receives operation input; an utterance start detector which, when the input unit receives the operation input, detects a start position of the speech; and a speaker identification un…
Who is the assignee on this patent?
Panasonic Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).