What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Language-agnostic multilingual modeling using effective script normalization

US12536989B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12536989-B2
Application number	US-202318187330-A
Country	US
Kind code	B2
Filing date	Mar 21, 2023
Priority date	Jan 28, 2020
Publication date	Jan 27, 2026
Grant date	Jan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method that when executed on data processing hardware causes the data processing hardware to perform operations comprising: obtaining a plurality of training data sets each associated with a respective native language that is different than the respective native language of the other training data sets, each training data set comprising a plurality of respective training data samples, each training data sample comprising training audio spoken in the respective native language and a corresponding transcription of the training audio in a respective native script representing the respective native language; and for each respective training data sample of each training data set: augmenting the corresponding training audio of the respective training data sample to create one or more copies of the corresponding training audio with diverse noise styles; transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding training audio into a corresponding transliterated script different than the respective native script; and based on the corresponding training audio, the one or more copies of the corresponding audio with diverse noise styles, and the corresponding transliterated text, training a multilingual speech recognition model to predict speech recognition results in the corresponding transliterated script for corresponding speech utterances spoken in the respective native language of the respective training data sample. 2 . The computer-implemented method of claim 1 , wherein training the multilingual speech recognition model comprises training an end-to-end multilingual speech recognition without providing any language information. 3 . The computer-implemented method of claim 1 , wherein transliterating the corresponding transcription in the respective native script comprises using a finite state transducer (FST) network to transliterate the corresponding transcription in the respective native script into the corresponding transliterated text. 4 . The computer-implemented method of claim 1 , wherein transliterating the corresponding transcription in the respective native script into the corresponding transliterated text comprises using a respective transliteration transducer associated with the respective native script to transliterate the corresponding transcription in the respective native script into the corresponding transliterated text. 5 . The computer-implemented method of claim 4 , wherein the transliteration transducer associated with the respective native script comprises: an input transducer configured to input Unicode symbols in the respective native script to symbols in a pair language model; a bigram pair language model transducer configured to map between symbols in the respective native script and the corresponding transliterated script; and an output transducer configured to map the symbols in the pair language model to output symbols in the corresponding transliterated script. 6 . The computer-implemented method of claim 4 , wherein the operations further comprise, prior to transliterating the corresponding transcription in the respective native language, training, using agreement-based data pre-processing, each respective transliteration transducer to only process transliteration pairs that have at least one spelling in the corresponding transliterated script of the transliterated text for a given native word that is common across each of the respective native languages associated with the training data sets. 7 . The computer-implemented method of claim 4 , wherein the operations further comprise, prior to transliterating the corresponding transcription in the respective native language, training, using frequency-based data pre-processing, each respective transliteration transducer to only process transliteration pairs that have spellings in the corresponding transliterated script of the transliterated text for a given native word that satisfy a frequency threshold. 8 . The computer-implemented method of claim 1 , wherein transliterating the corresponding transcription in the respective native script into the corresponding transliterated text comprises using a language-independent transliteration transducer to transliterate the corresponding transcription in the respective native script into the corresponding transliterated text. 9 . The computer-implemented method of claim 1 , wherein the multilingual speech recognition model comprises a sequence-to-sequence neural network. 10 . The computer-implemented method of claim 1 , wherein training the multilingual speech recognition model comprises using a stochastic optimization algorithm to train the multilingual speech recognition model. 11 . A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: obtaining a plurality of training data sets each associated with a respective native language that is different than the respective native language of the other training data sets, each training data set comprising a plurality of respective training data samples, each training data sample comprising training audio spoken in the respective native language and a corresponding transcription of the training audio in a respective native script representing the respective native language; and for each respective training data sample of each training data set: augmenting the corresponding training audio of the respective training data sample to create one or more copies of the corresponding training audio with diverse noise styles; transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding training audio into a corresponding transliterated script different than the respective native script; and based on the corresponding training audio, the one or more copies of the corresponding audio with diverse noise styles, and the corresponding transliterated text, training a multilingual speech recognition model to predict speech recognition results in the corresponding transliterated script for corresponding speech utterances spoken in the respective native language of the respective training data sample. 12 . The system of claim 11 , wherein training the multilingual speech recognition model comprises training an end-to-end multilingual speech recognition without providing any language information. 13 . The system of claim 11 , wherein transliterating the corresponding transcription in the respective native script comprises using a finite state transducer (FST) network to transliterate the corresponding transcription in the respective native script into the corresponding transliterated text. 14 . The system of claim 11 , wherein transliterating the corresponding transcription in the respective native script into the corresponding transliterated text comprises using a respective transliteration transducer associated with the respective native script to transliterate the corresponding transcription in the respective native script into the corresponding transliterated text. 15 . The system of claim 14 , wherein the transliteration transducer associated with the respective native script comprises: an input transducer configured to input Un

Assignees

Google Llc

Inventors

Classifications

G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L15/063Primary
Training · CPC title
G06N3/049
Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title
G06F40/58
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

View patent family 74592794

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536989B2 cover?: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding tr…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).