Speech-to-text transcription with multiple languages

US11049501B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11049501-B2
Application numberUS-201816141792-A
CountryUS
Kind codeB2
Filing dateSep 25, 2018
Priority dateSep 25, 2018
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method that includes obtaining a default language corpus. A second language corpus is obtained based on a second language preference. A first transcription of an utterance is received using the default language corpus and natural language processing (NLP). At least one problem word in the first transcription is determined based on an associated grammatical relevance to neighboring words in the first transcription. Upon determining that a first probability score is below a first threshold, an acoustic lookup is performed for an audible match for the problem word in the first transcription based on an associated acoustical relevance. Upon determining that a second probability score is below a second threshold, it is determined whether a match for the problem word exists in the secondary language corpus. Upon determining that the match exists in the secondary language corpus, a second transcription for the utterance is provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for bilingual speech-to-text (STT) transcription comprising: obtaining a default language corpus; determining a default language and a second language preference; obtaining a second language corpus based on the second language preference; receiving a first transcription of an utterance using the default language corpus and natural language processing (NLP); determining at least one problem word in the first transcription that does not fit within context of neighboring words in the first transcription based on a first probability score representing grammatical relevance of the at least one problem word to the neighboring words, wherein the first probability score is less than a first threshold, and the neighboring words are in the default language; and performing STT processing using machine learning based on a combination of an acoustic learning model and a grammar learning model comprising: determining an audible match in the default language corpus that is phonetically similar to the at least one problem word, wherein the at least one problem word is transcribed using an acoustic transcription based on a pre-existing corpus of transcription data from the default language; determining the audible match does not fit within the context of the neighboring words based on a second probability score representing grammatical relevance of the audible match to the neighboring words, wherein the second probability score is less than a second threshold; determining a match in the second language corpus that is phonetically similar to the at least one problem word; and providing a second transcription of the utterance, wherein the second transcription is a bilingual STT transcription comprising the match as a replacement for the at least one problem word. 2. The method of claim 1 , wherein the default language is set by an STT system. 3. The method of claim 1 , wherein determining the second language preference comprises obtaining the second language preference from a user profile, and the match in the second language corpus is phonetically similar to but semantically different from the audible match in the default language corpus. 4. The method of claim 1 , wherein the STT processing transcribes in the default language and the second language preference, and the first transcription is an acoustic transcription of the utterance and is based on the pre-existing corpus of transcription data from the default language. 5. The method of claim 4 , wherein the at least one problem word is grammatically incorrect based on the context of the neighboring words. 6. The method of claim 1 , wherein: the first threshold is a first probability threshold; the second threshold is a second probability threshold; and the first probability threshold and the second probability threshold are each one of user-defined and algorithmically learned. 7. The method of claim 1 , wherein: each probability score is based on the grammar learning model; the audible match is determined based on the acoustic learning model; the STT processing is refined through use of the machine learning to refine the STT processing and add more words to the pre-existing corpus of transcription data; determining the default language and the second language preference are based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages; and the second language corpus is in a different language than the default language corpus. 8. A computer program product for bilingual speech-to-text (STT) transcription, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: obtain, by the processor, a default language corpus; determine, by the processor, a default language and a second language preference; obtain, by the processor, a second language corpus based on the second language preference; receive, by the processor, a first transcription of an utterance using the default language corpus and natural language processing (NLP); determine, by the processor, at least one problem word in the first transcription that does not fit within context of neighboring words in the first transcription based on a first probability score representing grammatical relevance of the at least one problem word to the neighboring words, wherein the first probability score is less than a first threshold, and the neighboring words are in the default language; and perform STT processing, by the processor, using machine learning based on a combination of an acoustic learning model and a grammar learning model, comprising: determine, by the processor, an audible match in the default language corpus that is phonetically similar to the at least one problem word, wherein the at least one problem word is transcribed using an acoustic transcription based on a pre-existing corpus of transcription data from the default language; determine, by the processor, the audible match does not fit within the context of the neighboring words based on a second probability score representing grammatical relevance of the audible match to the neighboring words, wherein the second probability score is less than a second threshold; determine, by the processor, a match in the second language corpus that is phonetically similar to the at least one problem word; and provide, by the processor, a second transcription of the utterance wherein the second transcription is a bilingual STT transcription comprising the match as a replacement for the at least one problem word. 9. The computer program product of claim 8 , wherein the default language is set by an STT system. 10. The computer program product of claim 8 , wherein determining the second language preference comprises obtaining the second language preference from a user profile, and the match in the second language corpus is phonetically similar to but semantically different from the audible match in the default language corpus. 11. The computer program product of claim 8 , wherein the STT processing transcribes in the default language and the second language preference, and the first transcription is an acoustic transcription of the utterance and is based on the pre-existing corpus of transcription data from the default language. 12. The computer program product of claim 11 , wherein the at least one problem word is grammatically incorrect based on the context of the neighboring words. 13. The computer program product of claim 8 , wherein: the first threshold is a first probability threshold; the second threshold is a second probability threshold; and the first probability threshold and the second probability threshold are each one of user-defined and algorithmically learned. 14. The computer program product of claim 8 , wherein: each probability score is based on the grammar learning model; the audible match is determined based on the acoustic learning model; the STT processing is refined through use of the machine learning to refine the STT processing and add more words to the pre-existing corpus of transcription data; determining the default language and the second language preference are based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages; and the second language corpus is in a different language than the default language corpus. 15. An apparatus comprising: a memory configured to sto

Assignees

Inventors

Classifications

  • using natural language modelling · CPC title

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • of the speaker; Human-factor methodology · CPC title

  • Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11049501B2 cover?
One embodiment provides a method that includes obtaining a default language corpus. A second language corpus is obtained based on a second language preference. A first transcription of an utterance is received using the default language corpus and natural language processing (NLP). At least one problem word in the first transcription is determined based on an associated grammatical relevance to…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).