Tied and reduced RNN-T
US-12062363-B2 · Aug 13, 2024 · US
US9558741B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9558741-B2 |
| Application number | US-201414291138-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 30, 2014 |
| Priority date | May 14, 2013 |
| Publication date | Jan 31, 2017 |
| Grant date | Jan 31, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for speech recognition. For example, audio characteristics are extracted from acquired voice signals; a syllable confusion network is identified based on at least information associated with the audio characteristics; a word lattice is generated based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and an optimal character sequence is calculated in the word lattice as a speech recognition result.
Opening claim text (preview).
What is claimed is: 1. A method for speech recognition, the method comprising: extracting, by one or more data processors, audio characteristics from acquired voice signals; identifying, by the one or more data processors, a syllable confusion network based on at least information associated with the audio characteristics; generating, by the one or more data processors, a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and calculating, by the one or more data processors, an optimal character sequence in the word lattice as a speech recognition result of the acquired voice signals; wherein the syllable confusion network includes one or more sorted slices, wherein each of the one or more sorted slices includes a set of syllables, and wherein each syllable in the set of syllables is associated with a score; and wherein the generating a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary includes: traversing candidate characters in the predetermined phonetic dictionary corresponding to the slices in the syllable confusion network; in response to a first candidate character corresponding to a first syllable in a current slice and a second candidate character corresponding to a second syllable in a next slice forming a word, generating a first lattice node based on at least information associated with the word; and determining a first node score for the first lattice node based on a first score corresponding to the first syllable in the current slice and a second score corresponding to the second syllable in the next slice; in response to the first candidate character corresponding to the first syllable in the current slice and the second candidate character corresponding to the second syllable in the next slice not forming a word, generating a second lattice node based on at least information associated with the first candidate character; and determining a second node score for the second lattice node based on the first score. 2. The method of claim 1 , wherein the identifying a syllable confusion network based on at least information associated with the audio characteristics includes: identifying the syllable confusion network that includes two or more syllable paths based on at least information associated with the audio characteristics; or identifying the syllable confusion network that includes an optimal syllable path based on at least information associated with the audio characteristics. 3. The method of claim 1 , wherein: the generating a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary includes: connecting the first lattice node and the second lattice node based on at least information associated with a sequence related to the first syllable and the second syllable; and generating the word lattice based on at least information associated with the first lattice node, the second lattice node, a beginning lattice node and an ending lattice node. 4. The method of claim 1 , wherein: the word lattice includes a beginning lattice node, an ending lattice node, and one or more node paths located between the beginning lattice node and the ending lattice node; and the calculating an optimal character sequence in the word lattice includes: for each node path of the one or more node paths, setting a token on the node path between the beginning lattice node and the ending lattice node; moving the token from the beginning lattice node to the ending lattice node along the node path; and calculating a token score of the token based on at least information associated with one or more node scores related to one or more lattice nodes on the node path and a probability related to a predetermined language model; selecting a final token with a highest token score; and selecting a combination of final candidate characters corresponding to one or more final lattice nodes on a final node path related to the final token as the optimal character sequence. 5. The method of claim 4 , wherein the calculating a token score of the token based on at least information associated with one or more node scores related to one or more lattice nodes on the node path and a probability related to a predetermined language model includes: calculating the token score of the token based on at least information associated with a current node score related to a current lattice node and the probability of the predetermined language model; detecting whether the token score is smaller than a predetermined threshold; and in response to the token score being no smaller than the predetermined threshold, moving the token to a next lattice node; and repeating the calculating the token score of the token based on at least information associated with a current node score related to a current lattice node and the probability of the predetermined language model, and the detecting whether the token score is smaller than a predetermined threshold. 6. The method of claim 4 , further comprising: generating a language model database including one or more original language models based on at least information associated with a dictionary database including one or more original dictionaries; in response to a first dictionary being added to the dictionary database, generating a first language model based on at least information associated with the first dictionary; and adding the first language model to the language model database; in response to a second dictionary being deleted from the dictionary database, deleting a second language model corresponding to the second dictionary from the language model database; and in response to a third dictionary being modified, generating a third language model based on at least information associated with the third dictionary; and adding the third language model to the language model database; or modifying a fourth language model corresponding to the third dictionary in the language model database. 7. A device for speech recognition, includes: one or more data processors; and a computer-readable storage medium storing a characteristic-extraction module, a syllable-identification module, a lattice-generation module, and a character-identification module configured to be executed by the one or more data processors; wherein: the characteristic-extraction module configured to extract audio characteristics from acquired voice signals; the syllable-identification module configured to identify a syllable confusion network based on at least information associated with the audio characteristics; the lattice-generation module configured to generate a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and the character-identification module configured to calculating an optimal character sequence in the word lattice as a speech recognition result of the acquired voice signals; wherein the syllable confusion network includes one or more sorted slices, wherein each of the one or more sorted slices includes a set of syllables, and wherein each syllable in the set of syllables is associated with a score; and wherein the lattice-generation module includes: a network-traversal unit configured to traverse candidate characters in the predetermined phonetic dictionary corresponding to the slices in the syllable confusion network; a first generation unit configured to, in response to a first candidate character corresponding to a first syllable in a current slice and a second candidate character corresponding to a second syllabl
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
using context dependencies, e.g. language models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.