Training speech recognition systems using word sequences

US12499874B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12499874-B2
Application numberUS-202318538957-A
CountryUS
Kind codeB2
Filing dateDec 13, 2023
Priority dateAug 19, 2020
Publication dateDec 16, 2025
Grant dateDec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a corresponding one of multiple counters. The method may also include in response to the encrypted first word sequence corresponding to one of the multiple encrypted word sequences based on the comparison, incrementing a counter of the multiple counters associated with the one of the multiple encrypted word sequences and adapting a language model of an automatic transcription system using the multiple encrypted word sequences and the multiple counters.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method comprising: obtaining real-time audio data that includes speech; obtaining, in real-time as the audio data is obtained, a first transcription of the audio data, the first transcription including a plurality of first words and being generated by a first automatic speech recognition technology using the audio data; obtaining, in real-time as the audio data is obtained, a second transcription of the audio data, the second transcription including a plurality of second words and being generated by a second automatic speech recognition technology using the audio data; aligning, in real-time, words from the first transcription that are similar to words from the second transcription by generating in real-time a plurality of word alignment paths that each include a plurality of nodes within a coordinate system based on the first transcription and the second transcription, one or more of the nodes of each of the plurality of word alignment paths corresponding to one word from the first transcription and/or one word from the second transcription that are considered to be aligned for a respective ward alignment path, generating the plurality of word alignment paths including: setting a window in the coordinate system that includes one or more of the plurality of nodes of the plurality of word alignment paths; dynamically increasing in real-time a size of the window as additional words are added to the plurality of word alignment paths in response to words being added to the first transcription and/or the second transcription; dynamically decreasing in real-time the size of the window by adjusting a location of a second face of the window in the coordinate system to be further from an origin of the coordinate system in response to one or more word alignment paths satisfying a criteria; and adapting, in real-time, only portions of the plurality of word alignment paths within the window in response to words being added to the first transcription and/or the second transcription such that other portions of the plurality of word alignment paths outside the window are stable and not subject to change; and generating a real-time output text string as a transcription of the speech using words from one of the plurality of word alignment paths. 2 . The method of claim 1 , wherein one or more of the plurality of nodes of the plurality of word alignment paths includes one of the plurality of first words and a blank space in response to the first transcription including more words than the second transcription. 3 . The method of claim 1 , wherein the criteria includes: words in one or more nodes of the one or more word alignment paths matching, a time difference between words in the window where the time difference is based on when the words are output in one of the first transcription and the second transcription, a number of words in the window along an axis of the coordinate system satisfying a threshold, a number of the plurality of word alignment paths that pass through the second face satisfying a threshold, a number of the plurality of word alignment paths that intersect a point within the window satisfying a threshold, the one or more word alignment paths having a lowest cost and including a node with matching words, or the one or more word alignment paths being stable for a time period that satisfies a threshold. 4 . The method of claim 1 , wherein dynamically increasing the size of the window includes adjusting a location of a first face of the window in the coordinate system in response to obtaining a new word in the first transcription. 5 . The method of claim 4 , wherein dynamically increasing the size of the window includes adjusting a location of a third face of the window in the coordinate system in response to obtaining a new word in the second transcription. 6 . The method of claim 4 , wherein generating the plurality of word alignment paths further includes dynamically decreasing the size of the window by adjusting a location of a fourth face of the window in the coordinate system in response to the one or more word alignment paths satisfying a second criteria. 7 . The method of claim 1 , wherein generating the output text string includes: selecting one of the plurality of word alignment paths; selecting words from the selected word alignment path to include in the output text string; and generating the output text string using the selected words. 8 . The method of claim 1 , wherein the selected word alignment path is selected based on the selected word alignment path including one or more nodes within the window. 9 . The method of claim 1 , wherein generating the plurality of word alignment paths includes in response to obtaining one or more words in the first transcription and/or the second transcription, extending only word alignment paths that pass through a particular face or vertex of the window. 10 . The method of claim 1 , further comprising obtaining a third transcription of the audio data, the third transcription including a plurality of third words and being generated by a third automatic speech recognition technology using the audio data, wherein the coordinate system is based on the first transcription, the second transcription, and the third transcription, and one or more of the nodes of each of the plurality of word alignment paths including one of the plurality of first words, one of the plurality of second words, and one of the plurality of third words. 11 . One or more non-transitory computer-readable medias configured to store instructions that when executed by a system are configured to perform the method of claim 1 . 12 . A system comprising: at least one computer-readable media configured to store instructions; and at least one processor coupled to the at least one computer-readable media, the processor configured to execute the instructions to cause the system to perform operations, the operations comprising: obtaining real-time audio data that includes speech; obtaining, in real-time as the audio data is obtained, a first transcription of the audio data, the first transcription including a plurality of first words and being generated by a first automatic speech recognition technology using the audio data; obtaining, in real-time as the audio data is obtained, a second transcription of the audio data, the second transcription including a plurality of second words and being generated by a second automatic speech recognition technology using the audio data; aligning, in real-time, words from the first transcription that are similar to words from the second transcription by generating in real-time a plurality of word alignment paths that each include a plurality of nodes within a coordinate system based on the first transcription and the second transcription, one or more of the nodes of each of the plurality of word alignment paths corresponding to one word from the first transcription and/or one word from the second transcription that are considered to be aligned for a respective ward alignment path, generating the plurality of word alignment paths including: setting a window in the coordinate system that includes one or more of the plurality of nodes of the plurality of word alignment paths; dynamically increasing in real-time a size of the window as additional words are added to the plurality of word alignment paths in response to words being added to the first transcription and/or the second transcription; dynamically decreasing in real-time the size of the window by adjusting a location of a second face of the window in the coordinate system to be further from an origin of the coordinate

Assignees

Inventors

Classifications

  • involving random numbers or seeds · CPC title

  • for comparison or discrimination · CPC title

  • Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title

  • Training · CPC title

  • Providing cryptographic facilities or services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499874B2 cover?
A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a correspond…
Who is the assignee on this patent?
Sorenson Ip Holdings Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).