Methods and systems for correcting transcribed audio files
US-9245522-B2 · Jan 26, 2016 · US
US12499874B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499874-B2 |
| Application number | US-202318538957-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2023 |
| Priority date | Aug 19, 2020 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a corresponding one of multiple counters. The method may also include in response to the encrypted first word sequence corresponding to one of the multiple encrypted word sequences based on the comparison, incrementing a counter of the multiple counters associated with the one of the multiple encrypted word sequences and adapting a language model of an automatic transcription system using the multiple encrypted word sequences and the multiple counters.
Opening claim text (preview).
The invention claimed is: 1 . A method comprising: obtaining real-time audio data that includes speech; obtaining, in real-time as the audio data is obtained, a first transcription of the audio data, the first transcription including a plurality of first words and being generated by a first automatic speech recognition technology using the audio data; obtaining, in real-time as the audio data is obtained, a second transcription of the audio data, the second transcription including a plurality of second words and being generated by a second automatic speech recognition technology using the audio data; aligning, in real-time, words from the first transcription that are similar to words from the second transcription by generating in real-time a plurality of word alignment paths that each include a plurality of nodes within a coordinate system based on the first transcription and the second transcription, one or more of the nodes of each of the plurality of word alignment paths corresponding to one word from the first transcription and/or one word from the second transcription that are considered to be aligned for a respective ward alignment path, generating the plurality of word alignment paths including: setting a window in the coordinate system that includes one or more of the plurality of nodes of the plurality of word alignment paths; dynamically increasing in real-time a size of the window as additional words are added to the plurality of word alignment paths in response to words being added to the first transcription and/or the second transcription; dynamically decreasing in real-time the size of the window by adjusting a location of a second face of the window in the coordinate system to be further from an origin of the coordinate system in response to one or more word alignment paths satisfying a criteria; and adapting, in real-time, only portions of the plurality of word alignment paths within the window in response to words being added to the first transcription and/or the second transcription such that other portions of the plurality of word alignment paths outside the window are stable and not subject to change; and generating a real-time output text string as a transcription of the speech using words from one of the plurality of word alignment paths. 2 . The method of claim 1 , wherein one or more of the plurality of nodes of the plurality of word alignment paths includes one of the plurality of first words and a blank space in response to the first transcription including more words than the second transcription. 3 . The method of claim 1 , wherein the criteria includes: words in one or more nodes of the one or more word alignment paths matching, a time difference between words in the window where the time difference is based on when the words are output in one of the first transcription and the second transcription, a number of words in the window along an axis of the coordinate system satisfying a threshold, a number of the plurality of word alignment paths that pass through the second face satisfying a threshold, a number of the plurality of word alignment paths that intersect a point within the window satisfying a threshold, the one or more word alignment paths having a lowest cost and including a node with matching words, or the one or more word alignment paths being stable for a time period that satisfies a threshold. 4 . The method of claim 1 , wherein dynamically increasing the size of the window includes adjusting a location of a first face of the window in the coordinate system in response to obtaining a new word in the first transcription. 5 . The method of claim 4 , wherein dynamically increasing the size of the window includes adjusting a location of a third face of the window in the coordinate system in response to obtaining a new word in the second transcription. 6 . The method of claim 4 , wherein generating the plurality of word alignment paths further includes dynamically decreasing the size of the window by adjusting a location of a fourth face of the window in the coordinate system in response to the one or more word alignment paths satisfying a second criteria. 7 . The method of claim 1 , wherein generating the output text string includes: selecting one of the plurality of word alignment paths; selecting words from the selected word alignment path to include in the output text string; and generating the output text string using the selected words. 8 . The method of claim 1 , wherein the selected word alignment path is selected based on the selected word alignment path including one or more nodes within the window. 9 . The method of claim 1 , wherein generating the plurality of word alignment paths includes in response to obtaining one or more words in the first transcription and/or the second transcription, extending only word alignment paths that pass through a particular face or vertex of the window. 10 . The method of claim 1 , further comprising obtaining a third transcription of the audio data, the third transcription including a plurality of third words and being generated by a third automatic speech recognition technology using the audio data, wherein the coordinate system is based on the first transcription, the second transcription, and the third transcription, and one or more of the nodes of each of the plurality of word alignment paths including one of the plurality of first words, one of the plurality of second words, and one of the plurality of third words. 11 . One or more non-transitory computer-readable medias configured to store instructions that when executed by a system are configured to perform the method of claim 1 . 12 . A system comprising: at least one computer-readable media configured to store instructions; and at least one processor coupled to the at least one computer-readable media, the processor configured to execute the instructions to cause the system to perform operations, the operations comprising: obtaining real-time audio data that includes speech; obtaining, in real-time as the audio data is obtained, a first transcription of the audio data, the first transcription including a plurality of first words and being generated by a first automatic speech recognition technology using the audio data; obtaining, in real-time as the audio data is obtained, a second transcription of the audio data, the second transcription including a plurality of second words and being generated by a second automatic speech recognition technology using the audio data; aligning, in real-time, words from the first transcription that are similar to words from the second transcription by generating in real-time a plurality of word alignment paths that each include a plurality of nodes within a coordinate system based on the first transcription and the second transcription, one or more of the nodes of each of the plurality of word alignment paths corresponding to one word from the first transcription and/or one word from the second transcription that are considered to be aligned for a respective ward alignment path, generating the plurality of word alignment paths including: setting a window in the coordinate system that includes one or more of the plurality of nodes of the plurality of word alignment paths; dynamically increasing in real-time a size of the window as additional words are added to the plurality of word alignment paths in response to words being added to the first transcription and/or the second transcription; dynamically decreasing in real-time the size of the window by adjusting a location of a second face of the window in the coordinate system to be further from an origin of the coordinate
involving random numbers or seeds · CPC title
for comparison or discrimination · CPC title
Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title
Training · CPC title
Providing cryptographic facilities or services · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.