What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

US11568858B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11568858-B2
Application number	US-202017073337-A
Country	US
Kind code	B2
Filing date	Oct 17, 2020
Priority date	Oct 17, 2020
Publication date	Jan 31, 2023
Grant date	Jan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting, the method comprising: training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model; performing transliteration by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data; applying a filtering metric to the pool of transliterated data output from the multilingual network to select one or more portions of the transliterated data for a retraining of the acoustic model by selecting the one or more portions of the output transliterated data having a relatively higher count of symbols as compared to a remainder of the transliterated data; performing data augmentation by adding the one or more selected portions of the pool of transliterated data back to the original transcribed training data to obtain updated training data; and training a new multilingual acoustic model through the multilingual network using the updated training data. 2. The computer-implemented method according to claim 1 , further comprising: retraining the baseline multilingual acoustic model with the updated training data. 3. The computer-implemented method according to claim 1 , wherein: the original training data is from a low resource language; the multilingual network comprises a neural network including a plurality of language-specific output layers configured to model sets of symbols of each language separately; and the neural network outputs a language-specific portion of the transliterated data to at least one respective language-specific output layer. 4. The computer-implemented method according to claim 3 , wherein the adding of the one or more selected portions of the pool of transliterated data back to the original transcribed training includes relabeled data comprising new copies of data using symbols of other languages. 5. The computer-implemented method according to claim 3 , wherein the training of the multilingual network on a set of training languages is performed with the low resource language of the original transcribed training data comprising tens of hours of the original transcribed data. 6. The computer-implemented method according to claim 3 , further comprising generating semi-supervised labels in response to processing untranscribed data by the multilingual neural network. 7. The computer-implemented method according to claim 1 , wherein the processing of the plurality of multilingual data types includes processing transcribed training data, untranscribed data from the same set of training languages, and untranscribed data from different languages. 8. The computer-implemented method according to claim 1 , further comprising: adding a new language to the multilingual network; and outputting a transliterated data in the new language. 9. An automatic speech recognition system configured for a transliteration-based data augmentation of a multilingual acoustic model in a low resource setting, the system comprising: a processor; a memory coupled to the processor, the memory storing instructions to cause the process or to perform acts comprising: training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model; performing transliteration by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data; applying a filtering metric to the pool of transliterated data output from the multilingual network to select one or more portions of the transliterated data for retraining of the acoustic model by selecting the one or more portions of the output transliterated data having a relatively higher count of symbols as compared to a remainder of the transliterated data; performing data augmentation by adding the one or more selected portions of the output transliterated data back to the original transcribed training data to obtain updated training data; and training a new multilingual acoustic model using the updated training data. 10. The system according to claim 9 , wherein the instructions cause the processor to perform an additional act comprising: retraining the baseline multilingual acoustic model with the updated training data. 11. The system according to claim 9 , wherein: the multilingual network comprises a neural network including a plurality of language-specific output layers configured to model sets of symbols of each language separately: and the neural network is configured to output a language-specific portion of the transliterated data to at least one respective language-specific output layer. 12. The system according to claim 9 , wherein the processing of the plurality of multilingual data types includes processing transcribed training data, untranscribed data from the same set of training languages, and untranscribed data from different languages. 13. The system according to claim 12 , wherein the instructions cause the processor to perform additional acts comprising: adding a new language to the multilingual network: and outputting transliterated data in the new language. 14. A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting, the method comprising: training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model; performing transliteration by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data; applying a filtering metric to the pool of transliterated data output from the multilingual network to select one or more portions of the transliterated data for a retraining of the acoustic model by comparing a ratio of symbols in the transliterated data to symbols in an utterance comprising the original transcribed training data, and selecting one or more portions of the output transliterated data having a higher ratio of symbols; performing data augmentation by adding the one or more selected portions of the pool of transliterated data back to the original transcribed training data to obtain updated training data; and training a new multilingual acoustic model through the multilingual network using the updated training data. 15. An automatic speech recognition system configured for a transliteration-based data augmentation of a multilingual acoustic model in a low resource setting, the system comprising: a processor; a memory coupled to the processor, the memory storing instructions to cause the process or to perform acts comprising: training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model; performing transliteration by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data; applying a filtering metric to the pool of transliterated data output from the multilingual network to select one or more portions of the transliterated data for retraining of the acoustic model by: comparing a ratio of symbols in the transliterated data to symbols in an utterance comprising the original transcribed training data; an

Assignees

Inventors

Classifications

G10L2015/0635
updating or merging of old and new templates; Mean values; Weighting · CPC title
G10L15/063Primary
Training · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L15/005Primary
Language recognition · CPC title

Patent family

Related publications grouped by family.

View patent family 81184867

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568858B2 cover?: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual netw…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).