System and method for content-based media analysis
US-2017228382-A1 · Aug 10, 2017 · US
US10867136B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10867136-B2 |
| Application number | US-201715404941-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 12, 2017 |
| Priority date | Jul 7, 2016 |
| Publication date | Dec 15, 2020 |
| Grant date | Dec 15, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided is an automated interpretation method, apparatus, and system. The automated interpretation method includes encoding a voice signal in a first language to generate a first feature vector, decoding the first feature vector to generate a first language sentence in the first language, encoding the first language sentence to generate a second feature vector with respect to a second language, decoding the second feature vector to generate a second language sentence in the second language, controlling a generating of a candidate sentence list based on any one or any combination of the first feature vector, the first language sentence, the second feature vector, and the second language sentence, and selecting, from the candidate sentence list, a final second language sentence as a translation of the voice signal.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented interpretation method for translating a sentence from a first language to a second language, the first language and the second language being natural languages, the method comprising: receiving a voice signal as an input, the voice signal being an input source sentence uttered in the first language; encoding, using a voice recognition encoder, the voice signal by extracting abstracted voice information from the voice signal to generate a first feature vector using the abstracted voice information, the abstracted voice information being stored in a database; providing the first feature vector to an input layer of a voice recognition decoder; decoding, using the voice recognition decoder, the first feature vector to generate a first language sentence in the first language, the first language sentence being stored in the database and corresponding to the abstracted voice information; encoding, using a machine translation encoder, the first language sentence by extracting abstracted sentence information from the first language sentence to generate a second feature vector with respect to the second language using the abstracted sentence information, the abstracted sentence information being stored in the database and corresponding to the abstracted voice information; providing the second feature vector to an input layer of a machine translation decoder; decoding, using the machine translation decoder, the second feature vector to generate a second language sentence in the second language; generating a candidate sentence list to include the second language sentence and one or more previous translation final second language sentences, being retrieved from the database, based on any one or any combination of the first feature vector, the first language sentence, and the second feature vector, wherein the one or more previous translation final second language sentences are previously generated, based on respective previous second feature vectors, through previous recognition-translation processes performed using the machine translation decoder, and are previously stored in the database, wherein the one or more previous translation final second language sentences are determined to be similar to the input source sentence based on a comparison of the abstracted sentence information of the input source sentence and abstracted sentence information of plural previous translation final second language sentences previously stored in the database, and wherein the voice recognition encoder, the voice recognition decoder, the machine translation encoder, and the machine translation decoder are implemented in neural networks being trained; and selecting, from the candidate sentence list, a final second language sentence as a translation of the input source sentence corresponding to the input voice signal, wherein the generating of the candidate sentence list includes: acquiring a first interpretation result matching a first language feature vector, from the database, determined similar to the first feature vector; acquiring a second interpretation result matching a previous recognized sentence, from the database, determined similar to the first language sentence; and acquiring a third interpretation result matching a second language feature vector, from the database, determined similar to the second feature vector, and wherein the generating of the candidate sentence list further includes adding any of previous translation sentences corresponding to any of the first interpretation result, the second interpretation result, and the third interpretation result to the candidate sentence list. 2. The method of claim 1 , wherein the generating of the candidate sentence list includes acquiring a candidate sentence, from the database, determined to correspond to any one or any combination of the first feature vector, the first language sentence, and the second feature vector from the database. 3. The method of claim 2 , wherein the acquiring of the candidate sentence includes retrieving respective elements determined similar to any of the first feature vector, the first language sentence, and the second feature vector from a plurality of elements stored in the database based on one or more approximate nearest neighbor (NN) algorithms. 4. The method of claim 1 , wherein the acquiring of the second interpretation result includes: converting the first language sentence into a vector; and determining which of plural previous recognized sentences, from the database, are similar to the first language sentence based on the vector. 5. The method of claim 1 , wherein the selecting of the final second language sentence includes: calculating scores of candidate sentences included in the candidate sentence list based on the second feature vector; and selecting a candidate sentence, from the candidate sentence list, having a highest of the calculated scores to be the final second language sentence. 6. The method of claim 1 , wherein the generating of the first feature vector includes: sampling the voice signal in the first language based on a predetermined frame length; generating respective input vectors corresponding to frames; sequentially inputting the respective input vectors to the voice recognition encoder used for voice recognition; and determining the first feature vector to be an output from the voice recognition encoder for the sequentially input respective input vectors. 7. The method of claim 1 , wherein the generating of the first language sentence includes: inputting the first feature vector to another voice recognition decoder used for voice recognition; generating a predetermined number of sentence sequences based on probabilities of sub-words sequentially output from the other decoder; and selecting a sentence sequence having a highest score among the predetermined number of sentence sequences to be the first language sentence. 8. The method of claim 1 , wherein the generating of the second feature vector includes: dividing the first language sentence into a plurality of sub-words; sequentially inputting input vectors respectively indicating the plurality of sub-words to the machine translation encoder used for machine translation; and determining the second feature vector to be an output from the machine translation encoder for the sequentially input input vectors. 9. The method of claim 1 , further comprising: storing the first feature vector, the first language sentence, and the second feature vector in the database; and storing any one or any combination of the second language sentence and the final second language sentence corresponding to the first feature vector, the first language sentence, and the second feature vector in the database. 10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 11. The method of claim 1 , wherein the controlling of the generating of the candidate sentence list to include the second language sentence and the one or more previous translation final second language sentences is further based the second language sentence. 12. The method of claim 1 , wherein the generating of the candidate sentence list includes any one or any combination of: acquiring a first interpretation result matching a first language feature vector, from the database, determined similar to the first feature vector; acquiring a second interpretation result matching a second language feature vector, from the database, determined similar to the second feature vector, and adding any of previous fin
Vocoder architecture · CPC title
Neural networks · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Speech synthesis; Text to speech systems · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.