Voice data transmission method and apparatus
US-2024363120-A1 · Oct 31, 2024 · US
US2020312332A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020312332-A1 |
| Application number | US-202016826899-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 23, 2020 |
| Priority date | Mar 27, 2019 |
| Publication date | Oct 1, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech recognition device includes: an obtaining unit which obtains a speech uttered in a conversation between a first speaker and a second speaker; a storage which stores the speech obtained; an input unit which receives operation input; an utterance start detector which, when the input unit receives the operation input, detects a start position of the speech; and a speaker identification unit which identifies a speaker of the speech as the first speaker who has performed the operation input or the second speaker who has not performed the operation input, based on (i) first timing at which the input unit has received the operation input and (ii) second timing indicating the detected start position of the speech. The first and second timing are set for each speech of the first and second speakers. A speech recognizer performs speech recognition on the speech whose speaker has been identified.
Opening claim text (preview).
1 . A speech recognition device for a conversation between a first speaker and at least one second speaker who is a conversation partner of the first speaker, the speech recognition device comprising: an obtaining unit which obtains a speech uttered in the conversation between the first speaker and the at least one second speaker; a storage which stores the speech uttered in the conversation between the first speaker and the at least one second speaker and obtained by the obtaining unit; an input unit which receives operation input from at least the first speaker; an utterance start detector which, in response to the operation input received by the input unit, detects a start position of the speech stored in the storage, the start position being a position at which utterance of the speech has started; and a speaker identification unit which identifies a speaker of the speech as one of the first speaker who has performed the operation input on the input unit and the at least one second speaker who has not performed the operation input on the input unit, based on first timing and second timing which are set for each of speeches uttered in the conversation between the first speaker and the at least one second speaker, the first timing being timing at which the input unit has received the operation input, the second timing being timing which indicates the start position of the speech detected by the utterance start detector, wherein speech recognition is performed on the speech uttered by the one of the first speaker and the at least one second speaker identified by the speaker identification unit, the speech recognition being performed by a speech recognizer from the start position of the speech. 2 . The speech recognition device according to claim 1 , wherein the speaker identification unit: compares the first timing and the second timing which are set for each speech uttered in the conversation between the first speaker and the at least one second speaker; identifies the speaker of the speech as the first speaker from the first speaker and the at least one second speaker when the first timing is earlier than the second timing; and identifies the speaker of the speech as the at least one second speaker from the first speaker and the at least one second speaker when the second timing is earlier than the first timing. 3 . The speech recognition device according to claim 1 , wherein when the speaker of the speech is identified as the first speaker from the first speaker and the at least one second speaker, the speech recognizer performs the speech recognition on the speech of the first speaker, and when the speaker of the speech is identified as a second speaker from the first speaker and the at least one second speaker, the speech recognizer performs the speech recognition on the speech of the second speaker. 4 . The speech recognition device according to claim 1 , wherein the speaker identification unit identifies the speaker as one of the first speaker and the at least one second speaker, for each speech uttered in the conversation between the first speaker and the at least one second speaker in a specified period before or after the first timing at which the input unit has received the operation input. 5 . The speech recognition device according to claim 1 , wherein upon finish of the speech recognition on a speech of the first speaker who has performed the operation input on the input unit, the storage starts to store a speech obtained by the obtaining unit, to store a speech of the at least one second speaker. 6 . The speech recognition device according to claim 1 , comprising: a communication unit configured to communicate with a cloud server which includes the speech recognizer, wherein the communication unit transmits, to the cloud server, the speech of the one of the first speaker and the at least one second speaker identified by the speaker identification unit, and receives a result of the speech recognition that the speech recognizer included in the cloud server has performed on the speech from the start position of the speech. 7 . The speech recognition device according to claim 1 , comprising: the speech recognizer which performs the speech recognition on the speech of the one of the first speaker and the at least one second speaker identified by the speaker identification unit, the speech recognition being performed from the start position of the speech. 8 . The speech recognition device according to claim 1 , wherein the input unit is one operation button provided to the speech recognition device. 9 . The speech recognition device according to claim 1 , wherein the input unit receives the operation input from the first speaker for every speech of the first speaker and for every speech of the at least one second speaker. 10 . A speech recognition method for a conversation between a first speaker and at least one second speaker who is a conversation partner of the first speaker, the speech recognition method comprising: obtaining a speech uttered in the conversation between the first speaker and the at least one second speaker; storing, in a storage, the speech uttered in the conversation between the first speaker and the at least one second speaker and obtained; receiving, by an input unit, operation input from at least the first speaker; detecting, in response to the operation input received by the input unit, a start position the speech stored in the storage, the start position being a position at which utterance of the speech has started; identifying a speaker of the speech as one of the first speaker who has performed the operation input on the input unit and the at least one second speaker who has not performed the operation input on the input unit, based on first timing and second timing which are set for each of speeches uttered in the conversation between the first speaker and the at least one second speaker, the first timing being timing at which the input unit has received the operation input, the second timing being timing which indicates the start position of the speech detected; and performing speech recognition on the speech of the one of the first speaker and the at least one second speaker identified, the speech recognition being performed from the start position of the speech. 11 . A non-transitory computer-readable recording medium for use in a computer, the recording medium having a computer program recorded thereon for causing the computer to execute the speech recognition method according to claim 10 .
Speaker identification or verification techniques · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements (G01S5/28 takes precedence) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.