Speech translation device, speech translation method, and recording medium therefor
US-2019304442-A1 · Oct 3, 2019 · US
US11182567B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11182567-B2 |
| Application number | US-201916364811-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 26, 2019 |
| Priority date | Mar 29, 2018 |
| Publication date | Nov 23, 2021 |
| Grant date | Nov 23, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech translation apparatus includes: an estimator which estimates a sound source direction, based on an acoustic signal obtained by a microphone array unit; a controller which identifies that an utterer is a user or a conversation partner, based on the sound source direction estimated after the start of translation is instructed by a button, using a positional relationship indicated by a layout information item stored in storage and selected in advance, and determines a translation direction indicating input and output languages in and into which content of the acoustic signal is recognized and translated, respectively; and a translator which obtains, according to the translation direction, original text indicating the content in the input language and translated text indicating the content in the output language. The controller displays the original and translated texts on first and second display areas corresponding to the positions of the user and conversation partner, respectively.
Opening claim text (preview).
What is claimed is: 1. A speech translation apparatus, comprising: a translation start button which instructs start of translation when operated by one of a user of the speech translation apparatus and a conversation partner of the user; and one or more hardware processors configured to execute at least one program and cause the speech translation apparatus to perform the functions of: a sound source direction estimator which estimates a sound source direction by processing an acoustic signal obtained by a microphone array; a controller which (i) identifies that an utterer who utters speech is one of the user and the conversation partner, from (a) the sound source direction estimated by the sound source direction estimator after the start of the translation is instructed by the translation start button and (b) a positional relationship indicated by a layout information item selected in advance from a plurality of layout information items that are stored in storage and respectively indicate different positional relationships between the user, the conversation partner, and a display with respect to the speech translation apparatus, and (ii) determines a translation direction indicating an input language in which content of the acoustic signal is recognized and an output language into which the content of the acoustic signal is translated, the input language being one of a first language used by the user and predetermined and a second language used by the conversation partner and predetermined and the output language being the other one of the first language and the second language, the second language being different from the first language; a translator which obtains, according to the translation direction determined, (i) original text indicating the content of the acoustic signal obtained by causing a recognition processor to recognize the acoustic signal in the input language and (ii) translated text indicating the content of the acoustic signal obtained by causing a translation processor to translate the original text into the output language; and a display unit which displays the original text on a first area of the display, and simultaneously displays the translated text on a second area of the display, the first area corresponding to a position of the identified one of the user and the conversation partner, the second area corresponding to a position of the other one of the user and the conversation partner. 2. The speech translation apparatus according to claim 1 , wherein the translator includes the recognition processor and the translation processor. 3. The speech translation apparatus according to claim 1 , wherein the translator is connectable to a server via a network, and the server includes at least one of the recognition processor and the translation processor. 4. The speech translation apparatus according to claim 1 , wherein the one or more hardware processors are further configured to execute the at least one program and cause the speech translation apparatus to perform the functions of a delay unit which delays the acoustic signal obtained by the microphone array unit for a certain period of time; and a beam former which forms a beam which is an acoustic signal having a controlled sound receiving directivity in a predetermined direction by processing the acoustic signal delayed by the delay unit, wherein the beam former forms the beam in the sound source direction estimated by the sound source direction estimator to be the predetermined direction. 5. The speech translation apparatus according to claim 1 , further comprising: a speaker, wherein the translator obtains translated speech data obtained by causing a text synthesis processor to convert the translated text into speech data of the output language, and transfers the translated speech data to the speaker, and the speaker outputs speech of the translated text according to the translated speech data transferred. 6. The speech translation apparatus according to claim 1 , wherein the display has an elongated shape, and when the layout information item indicates a positional relationship in which the user and the conversation partner face each other across the display, the display unit displays the original text and the translated text in the first area and the second area, respectively, in such a manner that characters of the original text are oriented toward the identified one of the user and the conversation partner and characters of the translated text are oriented toward the other one of the user and the conversation partner. 7. The speech translation apparatus according to claim 1 , wherein the display has an elongated shape, and when the layout information item indicates a display-centered positional relationship in which the user is present at a first side of the display and the conversation partner is present at a second side of the display which is different from and perpendicular to the first side, the display unit displays the original text and the translated text in the first area and the second area, respectively, in such a manner that characters of the translated text are oriented toward the other one of the user and the conversation partner in a direction rotated by 90 degrees from a direction of the characters of the original text oriented toward the identified one of the user and the conversation partner. 8. The speech translation apparatus according to claim 1 , wherein the display has an elongated shape, and the plurality of layout information items include: (i) a positional relationship in which the user and the conversation partner face each other across the display; (ii) a positional relationship in which the user and the conversation partner are present side by side at one of sides of the display either in this order or an inverse order; and (iii) a display-centered positional relationship in which the user is present at the first side of the display and the conversation partner is present at the second side of the display which is different from and perpendicular to the first side. 9. The speech translation apparatus according to claim 1 , wherein the one or more hardware processors are further configured to execute the at least one program and cause the speech translation apparatus to perform the functions of a speech determiner which determines whether the acoustic signal obtained by the microphone array unit includes speech, wherein the controller determines the translation direction only when (i) the acoustic signal is determined to include speech by the speech determiner and (ii) the sound source direction estimated by the sound source direction estimator indicates the position of the user or the position of the conversation partner in the positional relationship indicated by the layout information item. 10. The speech translation apparatus according to claim 9 , wherein the one or more hardware processors are further configured to execute the at least one program and cause the speech translation apparatus to perform the functions of: a layout selection controller which (i) initializes the layout information item selected in advance when the start of the translation is instructed by the translation start button operated by the user, and (ii) selects one of the plurality of layout information items stored in the storage as the layout information item, based on a result of the determination made by the speech determiner and a result of the estimation performed by the sound source direction estimator. 11. The speech translation apparatus according to claim 10 , wherein the layout selection controller: after initializing the layout information item selected in advance,
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Systems for determining direction or deviation from predetermined direction · CPC title
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.