Transliteration for speech recognition training and scoring

US11417322B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11417322-B2
Application numberUS-201916712492-A
CountryUS
Kind codeB2
Filing dateDec 12, 2019
Priority dateDec 12, 2018
Publication dateAug 16, 2022
Grant dateAug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs stored on a computer-readable storage medium, for transliteration for speech recognition training and scoring. In some implementations, language examples are accessed, some of which include words in a first script and words in one or more other scripts. At least portions of some of the language examples are transliterated to the first script to generate a training data set. A language model is generated based on occurrences of the different sequences of words in the training data set in the first script. The language model is used to perform speech recognition for an utterance.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by one or more computers, the method comprising: accessing, by the one or more computers, a set of data indicating language examples for a first script, wherein at least some of the language examples include words in the first script and out-of-script words in one or more other scripts; accessing, by the one or more computers, a blacklist of terms in a script different than the first script; selectively transliterating, by the one or more computers, at least portions of some of the language examples by transliterating a portion of the out-of-script words to the first script and bypassing transliteration of a remaining portion of the out-of-script words that includes instances of the terms from the blacklist to generate a training data set having the portion of the out-of-script words transliterated into the first script and the remaining portion of the out-of-script words kept in the one or more other scripts; and generating, by the one or more computers, a speech recognition model based on occurrences of sequences of words in the training data set having the portion of the out-of-script words transliterated into the first script and the remaining portion of the out-of-script words kept in the one or more other scripts. 2. The method of claim 1 , wherein the speech recognition model is a language model, an acoustic model, a sequence-to-sequence model, or an end-to-end model. 3. The method of claim 1 , wherein selectively transliterating comprises mapping different tokens that represent text from different scripts to a single normalized transliterated representation. 4. The method of claim 1 , wherein selectively transliterating the language examples comprises transliterating the portion of the out-of-script words in the language examples that are not in the first script into the first script. 5. The method of claim 1 , wherein selectively transliterating the language examples comprises generating altered language examples in which words written in a second script different from the first script are replaced with one or more words in the first script that approximate acoustic properties of the word in the second script. 6. The method of claim 5 , wherein the words written in the second script are individually transliterated into the first script on a word-by-word basis. 7. The method of claim 1 , further comprising: determining a test set of language examples with which to test the speech recognition model; generating a normalized test set by transliterating into the first script words of the language examples in the test set that are not written in the first script; obtaining output of the speech recognition model corresponding to the language examples in the test set; normalizing output of the speech recognition model by transliterating into the first script words of the speech recognition model output that are not written in the first script; and determining an error rate of the speech recognition model based on a comparison of the normalized test set with the normalized speech recognition model output. 8. The method of claim 7 , wherein the error rate is a word error rate, and wherein the method includes, based on the word error rate: determining whether to continue training or terminate training of the speech recognition model; altering a training data set used to train the speech recognition model; setting a size, structure, or other characteristic of the speech recognition model; or selecting one or more speech recognition models for a speech recognition task. 9. The method of claim 1 , further comprising determining a modelling error rate for the speech recognition model in which acoustically similar words written in any of multiple scripts are accepted as correct transcriptions, without penalizing output of a word in a different script than a corresponding word in a reference transcription. 10. The method of claim 9 , further comprising determining a rendering error rate for the speech recognition model that is a measure of differences between a script of words in the output of the speech recognition model relative to a script of corresponding words in reference transcriptions. 11. The method of claim 1 , wherein selectively transliterating is performed using a finite state transducer network trained to perform transliteration into the first script. 12. The method of claim 1 , wherein selectively transliterating comprises, for at least one language example, performing multiple rounds of transliteration between scripts to reach a transliterated representation in the first script that is included in the training data set in the first script. 13. The method of claim 1 , further comprising determining a score indicating a level of mixing of scripts in the language examples; and based on the score: selecting a parameter for pruning a finite state transducer network for transliteration; selecting a parameter for pruning the speech recognition model; or selecting a size or structure for the speech recognition model. 14. The method of claim 1 , wherein generating the speech recognition model comprises: after selectively transliterating at least portions of some the language examples by transliterating the portion of the out-of-script words to the first script, determining, by the one or more computers, a count of occurrences of different sequences of words in the training data set in the first script; and generating, by the one or more computers, a speech recognition model based on the counts of occurrences of the different sequences of words in the training data set in the first script. 15. The method of claim 1 , wherein the speech recognition model comprises a recurrent neural network, and generating the speech recognition model comprises training the recurrent neural network. 16. The method of claim 1 , further comprising using, by the one or more computers, the model to perform speech recognition for an utterance. 17. The method of claim 1 , further comprising: receiving, by one or more computers, audio data representing an utterance; and using, by the one or more computers, the speech generation model to map the audio data to text representing the utterance. 18. A system comprising: one or more computers; and one or more computer-readable media storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: accessing a set of data indicating language examples for a first script, wherein at least some of the language examples include words in the first script and out-of-script words in one or more other scripts; accessing a blacklist of terms in a script different than the first script; selectively transliterating at least portions of some of the language examples by transliterating a portion of the out-of-script words to the first script and bypassing transliteration of a remaining portion of the out-of-script words that includes instances of the terms from the blacklist to generate a training data set having the portion of the out-of-script words transliterated into the first script and the remaining portion of the out-of-script words kept in the one or more other scripts; and generating a speech recognition model based on occurrences of sequences of words in the training data set having the portion of the out-of-script words transliterated into the first script and remaining portion of the out-of-script words kept in the one or more other scripts. 19. One or more non-transitory

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Training · CPC title

  • using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11417322B2 cover?
Methods, systems, and apparatus, including computer programs stored on a computer-readable storage medium, for transliteration for speech recognition training and scoring. In some implementations, language examples are accessed, some of which include words in a first script and words in one or more other scripts. At least portions of some of the language examples are transliterated to the first…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/183. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).