Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F40/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic speech recognition systems and processes

US12307213B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12307213-B2
Application number	US-202217836390-A
Country	US
Kind code	B2
Filing date	Jun 9, 2022
Priority date	Jun 9, 2022
Publication date	May 20, 2025
Grant date	May 20, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system comprising: a processor; and a machine-readable storage medium storing executable instructions that, when executed, cause the processor to perform operations comprising: receiving speech data for a plurality of languages; identifying and extracting graphemes from the speech data using a grapheme extraction engine; normalizing the speech data using a normalizing engine that applies linguistic based rules for Latin-based languages to map the graphemes from the speech data to graphemes in a Latin-based language; building a computer model using the normalized speech data; fine-tuning the computer model using additional speech data; and recognizing words in a target language using the fine-tuned computer model, wherein the computer model is a Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data. 2. The data processing system of claim 1 , wherein the plurality of languages include English, French, Italian, German, and Spanish languages and the speech data includes over 10,000 hours of data for each language. 3. The data processing system of claim 1 , wherein identifying and extracting the graphemes from the speech data using the grapheme extraction engine includes using natural language processing. 4. The data processing system of claim 1 , wherein the machine-readable storage medium includes instructions configured to cause the processor to perform an operation of: receiving target speech data of the target language for the recognizing the words in the target language. 5. The data processing system of claim 1 , wherein the speech data includes data from video, broadcast news, and dictation sources for English, French, Italian, German, and Spanish languages. 6. The data processing system of claim 1 , wherein the machine-readable storage medium includes instructions configured to cause the processor to perform an operation of: collecting the speech data from video, broadcast news, and dictation sources. 7. A method implemented in a data processing system, the method comprising: receiving speech data for a plurality of languages; identifying and extracting graphemes from the speech data using a grapheme extraction engine; normalizing the speech data using a normalizing engine that applies linguistic based rules for Latin-based languages to map the graphemes from the speech data to graphemes in a Latin-based language; building a computer model using the normalized speech data; fine-tuning the computer model using additional speech data; receiving target speech data of a target language; and recognizing words of the target language in the target speech data using the fine-tuned computer model, wherein the computer model is a transformer model that has a top layer fine-tuned by the additional speech data. 8. The method of claim 7 , further comprising: collecting the speech data from video, broadcast news, and dictation sources for English, French, Italian, German, and Spanish languages. 9. The method of claim 7 , wherein identifying and extracting the graphemes from the speech data using the grapheme extraction engine includes using natural language processing. 10. The method of claim 7 , wherein the computer model is a Latency-Control Bidirectional Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data. 11. The method of claim 7 , wherein the plurality of languages includes English, French, Italian, German, and Spanish languages and the speech data includes over 10,000 hours of data for each language. 12. A non-transitory machine-readable medium on which are stored instructions that, when executed, cause a processor of a programmable device to perform operations of: receiving speech data for a plurality of different languages; identifying and extracting graphemes from the speech data using a grapheme extraction engine; normalizing the speech data using a normalizing engine that applies linguistic based rules for Latin-based languages to map the graphemes from the speech data to graphemes in a Latin-based language; building a computer model using the normalized speech data; fine-tuning the computer model using additional speech data; receiving target speech data of a target language; and recognizing target words of the target language in the target speech data using the fine-tuned computer model; wherein the computer model is a Bidirectional Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data. 13. The non-transitory machine-readable medium of claim 12 , wherein the plurality of different languages includes English, French, Italian, German, and Spanish languages and the speech data includes over 10,000 hours of data for each language. 14. The non-transitory machine-readable medium of claim 12 , wherein the computer model is a Latency-Control Bidirectional Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L15/16
using artificial neural networks · CPC title
G10L15/063
Training · CPC title
G10L15/005
Language recognition · CPC title
G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 86386830

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12307213B2 cover?: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).