Microphone device to provide audio with spatial context
US-2021235189-A1 · Jul 29, 2021 · US
US11507759B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11507759-B2 |
| Application number | US-202016824110-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 19, 2020 |
| Priority date | Mar 25, 2019 |
| Publication date | Nov 22, 2022 |
| Grant date | Nov 22, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.
Opening claim text (preview).
The invention claimed is: 1. A speech translation device for conversation between a first speaker and a second speaker, the first speaker making an utterance in a first language, the second speaker making an utterance in a second language different from the first language, the speech translation device comprising: a speech detector that detects, from sounds that are input to an audio input unit, a speech segment in which the first speaker or the second speaker has made an utterance; a display that, after speech recognition is performed on the utterance in the speech segment detected by the speech detector, displays a translation result obtained by translating the utterance from the first language to the second language or a translation result obtained by translating the utterance from the second language to the first language; an utterance circuit that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after the first speaker has made an utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after the second speaker has made an utterance; the audio input unit to which a voice of the utterance made by the first speaker or the second speaker in the conversation is input; a speech recognizer that performs speech recognition on the utterance in the speech segment detected by the speech detector, to convert the utterance into text; a translator that translates the text into which the utterance has been converted by the speech recognizer, from the first language to the second language or from the second language to the first language; and an audio output unit that outputs by voice a result of the translation made by the translator, wherein the audio input unit comprises a plurality of audio input units, the speech translation device further comprises: a first beam former that performs signal processing on a voice that is input to at least one of the plurality of audio input units, to cause directivity of sound collection to coincide with a sound source direction of the utterance made by the first speaker; a second beam former that performs signal processing on the voice that is input to at least one of the plurality of audio input units, to cause directivity of sound collection to coincide with a sound source direction of the utterance made by the second speaker; and an input switch that switches between obtaining an output signal from the first beam former and obtaining an output signal from the second beam former. 2. The speech translation device according to claim 1 , further comprising: a priority utterance input unit that, when speech recognition is performed on the utterance made by the first speaker or the second speaker, causes speech recognition to be performed again on the utterance on which the speech recognition has been performed. 3. The speech translation device according to claim 1 , wherein the utterance circuit: outputs, in the first language via the display, the message prompting the first speaker to make an utterance when the speech translation device is activated; and outputs, in the second language via the display, the message prompting the second speaker to make an utterance after the utterance made by the first speaker is translated from the first language to the second language and a result of the translation is displayed on the display. 4. The speech translation device according to claim 1 , wherein after a start of the translation, the utterance circuit causes the audio output unit to output, a specified number of times, a voice message for prompting utterance, and after the audio output unit has output the voice message the specified number of times, the utterance circuit causes the display to display a message for prompting utterance. 5. The speech translation device according to claim 1 , wherein the speech recognizer outputs a result of the speech recognition performed on the utterance and a reliability score of the result, and when the reliability score obtained from the speech recognizer is lower than or equal to a threshold, the utterance circuit outputs a message prompting utterance via at least one of the display or the audio output unit, without translating the utterance whose reliability score is lower than or equal to the threshold. 6. The speech translation device according to claim 1 , wherein the speech translation device further comprises a sound source direction estimator that estimates a sound source direction by performing signal processing on the voice that is input to the plurality of audio input units, and the utterance circuit causes the input switch to switch between the obtaining of an output signal from the first beam former and the obtaining of an output signal from the second beam former. 7. A speech translation device for conversation between a first speaker and a second speaker, the first speaker making an utterance in a first language, the second speaker making an utterance in a second language different from the first language, the speech translation device comprising: a speech detector that detects, from sounds that are input to an audio input unit, a speech segment in which the first speaker or the second speaker has made an utterance; a display that, after speech recognition is performed on the utterance in the speech segment detected by the speech detector, displays a translation result obtained by translating the utterance from the first language to the second language or a translation result obtained by translating the utterance from the second language to the first language; an utterance circuit that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after the first speaker has made an utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after the second speaker has made an utterance; the audio input unit to which a voice of the utterance made by the first speaker or the second speaker in the conversation is input; a speech recognizer that performs speech recognition on the utterance in the speech segment detected by the speech detector, to convert the utterance into text; a translator that translates the text into which the utterance has been converted by the speech recognizer, from the first language to the second language or from the second language to the first language; and an audio output unit that outputs by voice a result of the translation made by the translator, wherein the audio input unit comprises a plurality of audio input units, the speech translation device further comprises: a sound source direction estimator that estimates a sound source direction by performing signal processing on a voice that is input to the plurality of audio input units; and a controller that causes the display to display the first language in a display area corresponding to a location of the first speaker with respect to the speech translation device, and display the second language in a display area corresponding to a location of the second speaker with respect to the speech translation device, and the controller: compares a display direction and the sound source direction estimated by the sound source direction estimator, the display direction being a direction from the display of the speech translation device to the first speaker or the second speaker and being a direction for either one of the display areas of the display; causes the speech recognizer and the translator to operate when the display direction substantially coincides with the sound source direction estimated; and causes the speech recognizer and the translator to stop when t
Speech synthesis; Text to speech systems · CPC title
Event management; Broadcasting; Multicasting; Notifications · CPC title
Segmentation; Word boundary detection · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Split screen, i.e. subdividing the display area or the window area into separate subareas · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.