Speech translation device, speech translation method, and recording medium

US11507759B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11507759-B2
Application numberUS-202016824110-A
CountryUS
Kind codeB2
Filing dateMar 19, 2020
Priority dateMar 25, 2019
Publication dateNov 22, 2022
Grant dateNov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.

First claim

Opening claim text (preview).

The invention claimed is: 1. A speech translation device for conversation between a first speaker and a second speaker, the first speaker making an utterance in a first language, the second speaker making an utterance in a second language different from the first language, the speech translation device comprising: a speech detector that detects, from sounds that are input to an audio input unit, a speech segment in which the first speaker or the second speaker has made an utterance; a display that, after speech recognition is performed on the utterance in the speech segment detected by the speech detector, displays a translation result obtained by translating the utterance from the first language to the second language or a translation result obtained by translating the utterance from the second language to the first language; an utterance circuit that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after the first speaker has made an utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after the second speaker has made an utterance; the audio input unit to which a voice of the utterance made by the first speaker or the second speaker in the conversation is input; a speech recognizer that performs speech recognition on the utterance in the speech segment detected by the speech detector, to convert the utterance into text; a translator that translates the text into which the utterance has been converted by the speech recognizer, from the first language to the second language or from the second language to the first language; and an audio output unit that outputs by voice a result of the translation made by the translator, wherein the audio input unit comprises a plurality of audio input units, the speech translation device further comprises: a first beam former that performs signal processing on a voice that is input to at least one of the plurality of audio input units, to cause directivity of sound collection to coincide with a sound source direction of the utterance made by the first speaker; a second beam former that performs signal processing on the voice that is input to at least one of the plurality of audio input units, to cause directivity of sound collection to coincide with a sound source direction of the utterance made by the second speaker; and an input switch that switches between obtaining an output signal from the first beam former and obtaining an output signal from the second beam former. 2. The speech translation device according to claim 1 , further comprising: a priority utterance input unit that, when speech recognition is performed on the utterance made by the first speaker or the second speaker, causes speech recognition to be performed again on the utterance on which the speech recognition has been performed. 3. The speech translation device according to claim 1 , wherein the utterance circuit: outputs, in the first language via the display, the message prompting the first speaker to make an utterance when the speech translation device is activated; and outputs, in the second language via the display, the message prompting the second speaker to make an utterance after the utterance made by the first speaker is translated from the first language to the second language and a result of the translation is displayed on the display. 4. The speech translation device according to claim 1 , wherein after a start of the translation, the utterance circuit causes the audio output unit to output, a specified number of times, a voice message for prompting utterance, and after the audio output unit has output the voice message the specified number of times, the utterance circuit causes the display to display a message for prompting utterance. 5. The speech translation device according to claim 1 , wherein the speech recognizer outputs a result of the speech recognition performed on the utterance and a reliability score of the result, and when the reliability score obtained from the speech recognizer is lower than or equal to a threshold, the utterance circuit outputs a message prompting utterance via at least one of the display or the audio output unit, without translating the utterance whose reliability score is lower than or equal to the threshold. 6. The speech translation device according to claim 1 , wherein the speech translation device further comprises a sound source direction estimator that estimates a sound source direction by performing signal processing on the voice that is input to the plurality of audio input units, and the utterance circuit causes the input switch to switch between the obtaining of an output signal from the first beam former and the obtaining of an output signal from the second beam former. 7. A speech translation device for conversation between a first speaker and a second speaker, the first speaker making an utterance in a first language, the second speaker making an utterance in a second language different from the first language, the speech translation device comprising: a speech detector that detects, from sounds that are input to an audio input unit, a speech segment in which the first speaker or the second speaker has made an utterance; a display that, after speech recognition is performed on the utterance in the speech segment detected by the speech detector, displays a translation result obtained by translating the utterance from the first language to the second language or a translation result obtained by translating the utterance from the second language to the first language; an utterance circuit that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after the first speaker has made an utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after the second speaker has made an utterance; the audio input unit to which a voice of the utterance made by the first speaker or the second speaker in the conversation is input; a speech recognizer that performs speech recognition on the utterance in the speech segment detected by the speech detector, to convert the utterance into text; a translator that translates the text into which the utterance has been converted by the speech recognizer, from the first language to the second language or from the second language to the first language; and an audio output unit that outputs by voice a result of the translation made by the translator, wherein the audio input unit comprises a plurality of audio input units, the speech translation device further comprises: a sound source direction estimator that estimates a sound source direction by performing signal processing on a voice that is input to the plurality of audio input units; and a controller that causes the display to display the first language in a display area corresponding to a location of the first speaker with respect to the speech translation device, and display the second language in a display area corresponding to a location of the second speaker with respect to the speech translation device, and the controller: compares a display direction and the sound source direction estimated by the sound source direction estimator, the display direction being a direction from the display of the speech translation device to the first speaker or the second speaker and being a direction for either one of the display areas of the display; causes the speech recognizer and the translator to operate when the display direction substantially coincides with the sound source direction estimated; and causes the speech recognizer and the translator to stop when t

Assignees

Inventors

Classifications

  • Speech synthesis; Text to speech systems · CPC title

  • Event management; Broadcasting; Multicasting; Notifications · CPC title

  • Segmentation; Word boundary detection · CPC title

  • G06F40/58Primary

    Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Split screen, i.e. subdividing the display area or the window area into separate subareas · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11507759B2 cover?
A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recogn…
Who is the assignee on this patent?
Panasonic Corp, Panasonic Holdings Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).