What technology area does this patent fall under?

Primary CPC classification G10L15/32. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speech-to-text transcription with multiple languages

US11049501B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11049501-B2
Application number	US-201816141792-A
Country	US
Kind code	B2
Filing date	Sep 25, 2018
Priority date	Sep 25, 2018
Publication date	Jun 29, 2021
Grant date	Jun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method that includes obtaining a default language corpus. A second language corpus is obtained based on a second language preference. A first transcription of an utterance is received using the default language corpus and natural language processing (NLP). At least one problem word in the first transcription is determined based on an associated grammatical relevance to neighboring words in the first transcription. Upon determining that a first probability score is below a first threshold, an acoustic lookup is performed for an audible match for the problem word in the first transcription based on an associated acoustical relevance. Upon determining that a second probability score is below a second threshold, it is determined whether a match for the problem word exists in the secondary language corpus. Upon determining that the match exists in the secondary language corpus, a second transcription for the utterance is provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for bilingual speech-to-text (STT) transcription comprising: obtaining a default language corpus; determining a default language and a second language preference; obtaining a second language corpus based on the second language preference; receiving a first transcription of an utterance using the default language corpus and natural language processing (NLP); determining at least one problem word in the first transcription that does not fit within context of neighboring words in the first transcription based on a first probability score representing grammatical relevance of the at least one problem word to the neighboring words, wherein the first probability score is less than a first threshold, and the neighboring words are in the default language; and performing STT processing using machine learning based on a combination of an acoustic learning model and a grammar learning model comprising: determining an audible match in the default language corpus that is phonetically similar to the at least one problem word, wherein the at least one problem word is transcribed using an acoustic transcription based on a pre-existing corpus of transcription data from the default language; determining the audible match does not fit within the context of the neighboring words based on a second probability score representing grammatical relevance of the audible match to the neighboring words, wherein the second probability score is less than a second threshold; determining a match in the second language corpus that is phonetically similar to the at least one problem word; and providing a second transcription of the utterance, wherein the second transcription is a bilingual STT transcription comprising the match as a replacement for the at least one problem word. 2. The method of claim 1 , wherein the default language is set by an STT system. 3. The method of claim 1 , wherein determining the second language preference comprises obtaining the second language preference from a user profile, and the match in the second language corpus is phonetically similar to but semantically different from the audible match in the default language corpus. 4. The method of claim 1 , wherein the STT processing transcribes in the default language and the second language preference, and the first transcription is an acoustic transcription of the utterance and is based on the pre-existing corpus of transcription data from the default language. 5. The method of claim 4 , wherein the at least one problem word is grammatically incorrect based on the context of the neighboring words. 6. The method of claim 1 , wherein: the first threshold is a first probability threshold; the second threshold is a second probability threshold; and the first probability threshold and the second probability threshold are each one of user-defined and algorithmically learned. 7. The method of claim 1 , wherein: each probability score is based on the grammar learning model; the audible match is determined based on the acoustic learning model; the STT processing is refined through use of the machine learning to refine the STT processing and add more words to the pre-existing corpus of transcription data; determining the default language and the second language preference are based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages; and the second language corpus is in a different language than the default language corpus. 8. A computer program product for bilingual speech-to-text (STT) transcription, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: obtain, by the processor, a default language corpus; determine, by the processor, a default language and a second language preference; obtain, by the processor, a second language corpus based on the second language preference; receive, by the processor, a first transcription of an utterance using the default language corpus and natural language processing (NLP); determine, by the processor, at least one problem word in the first transcription that does not fit within context of neighboring words in the first transcription based on a first probability score representing grammatical relevance of the at least one problem word to the neighboring words, wherein the first probability score is less than a first threshold, and the neighboring words are in the default language; and perform STT processing, by the processor, using machine learning based on a combination of an acoustic learning model and a grammar learning model, comprising: determine, by the processor, an audible match in the default language corpus that is phonetically similar to the at least one problem word, wherein the at least one problem word is transcribed using an acoustic transcription based on a pre-existing corpus of transcription data from the default language; determine, by the processor, the audible match does not fit within the context of the neighboring words based on a second probability score representing grammatical relevance of the audible match to the neighboring words, wherein the second probability score is less than a second threshold; determine, by the processor, a match in the second language corpus that is phonetically similar to the at least one problem word; and provide, by the processor, a second transcription of the utterance wherein the second transcription is a bilingual STT transcription comprising the match as a replacement for the at least one problem word. 9. The computer program product of claim 8 , wherein the default language is set by an STT system. 10. The computer program product of claim 8 , wherein determining the second language preference comprises obtaining the second language preference from a user profile, and the match in the second language corpus is phonetically similar to but semantically different from the audible match in the default language corpus. 11. The computer program product of claim 8 , wherein the STT processing transcribes in the default language and the second language preference, and the first transcription is an acoustic transcription of the utterance and is based on the pre-existing corpus of transcription data from the default language. 12. The computer program product of claim 11 , wherein the at least one problem word is grammatically incorrect based on the context of the neighboring words. 13. The computer program product of claim 8 , wherein: the first threshold is a first probability threshold; the second threshold is a second probability threshold; and the first probability threshold and the second probability threshold are each one of user-defined and algorithmically learned. 14. The computer program product of claim 8 , wherein: each probability score is based on the grammar learning model; the audible match is determined based on the acoustic learning model; the STT processing is refined through use of the machine learning to refine the STT processing and add more words to the pre-existing corpus of transcription data; determining the default language and the second language preference are based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages; and the second language corpus is in a different language than the default language corpus. 15. An apparatus comprising: a memory configured to sto

Assignees

Inventors

Classifications

G10L15/18
using natural language modelling · CPC title
G10L15/32Primary
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
G10L2015/227
of the speaker; Human-factor methodology · CPC title
G10L15/19
Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title
G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

View patent family 69883270

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11049501B2 cover?: One embodiment provides a method that includes obtaining a default language corpus. A second language corpus is obtained based on a second language preference. A first transcription of an utterance is received using the default language corpus and natural language processing (NLP). At least one problem word in the first transcription is determined based on an associated grammatical relevance to…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).