What technology area does this patent fall under?

Primary CPC classification G10L15/083. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for speech recognition

US9558741B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9558741-B2
Application number	US-201414291138-A
Country	US
Kind code	B2
Filing date	May 30, 2014
Priority date	May 14, 2013
Publication date	Jan 31, 2017
Grant date	Jan 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for speech recognition. For example, audio characteristics are extracted from acquired voice signals; a syllable confusion network is identified based on at least information associated with the audio characteristics; a word lattice is generated based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and an optimal character sequence is calculated in the word lattice as a speech recognition result.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for speech recognition, the method comprising: extracting, by one or more data processors, audio characteristics from acquired voice signals; identifying, by the one or more data processors, a syllable confusion network based on at least information associated with the audio characteristics; generating, by the one or more data processors, a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and calculating, by the one or more data processors, an optimal character sequence in the word lattice as a speech recognition result of the acquired voice signals; wherein the syllable confusion network includes one or more sorted slices, wherein each of the one or more sorted slices includes a set of syllables, and wherein each syllable in the set of syllables is associated with a score; and wherein the generating a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary includes: traversing candidate characters in the predetermined phonetic dictionary corresponding to the slices in the syllable confusion network; in response to a first candidate character corresponding to a first syllable in a current slice and a second candidate character corresponding to a second syllable in a next slice forming a word, generating a first lattice node based on at least information associated with the word; and determining a first node score for the first lattice node based on a first score corresponding to the first syllable in the current slice and a second score corresponding to the second syllable in the next slice; in response to the first candidate character corresponding to the first syllable in the current slice and the second candidate character corresponding to the second syllable in the next slice not forming a word, generating a second lattice node based on at least information associated with the first candidate character; and determining a second node score for the second lattice node based on the first score. 2. The method of claim 1 , wherein the identifying a syllable confusion network based on at least information associated with the audio characteristics includes: identifying the syllable confusion network that includes two or more syllable paths based on at least information associated with the audio characteristics; or identifying the syllable confusion network that includes an optimal syllable path based on at least information associated with the audio characteristics. 3. The method of claim 1 , wherein: the generating a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary includes: connecting the first lattice node and the second lattice node based on at least information associated with a sequence related to the first syllable and the second syllable; and generating the word lattice based on at least information associated with the first lattice node, the second lattice node, a beginning lattice node and an ending lattice node. 4. The method of claim 1 , wherein: the word lattice includes a beginning lattice node, an ending lattice node, and one or more node paths located between the beginning lattice node and the ending lattice node; and the calculating an optimal character sequence in the word lattice includes: for each node path of the one or more node paths, setting a token on the node path between the beginning lattice node and the ending lattice node; moving the token from the beginning lattice node to the ending lattice node along the node path; and calculating a token score of the token based on at least information associated with one or more node scores related to one or more lattice nodes on the node path and a probability related to a predetermined language model; selecting a final token with a highest token score; and selecting a combination of final candidate characters corresponding to one or more final lattice nodes on a final node path related to the final token as the optimal character sequence. 5. The method of claim 4 , wherein the calculating a token score of the token based on at least information associated with one or more node scores related to one or more lattice nodes on the node path and a probability related to a predetermined language model includes: calculating the token score of the token based on at least information associated with a current node score related to a current lattice node and the probability of the predetermined language model; detecting whether the token score is smaller than a predetermined threshold; and in response to the token score being no smaller than the predetermined threshold, moving the token to a next lattice node; and repeating the calculating the token score of the token based on at least information associated with a current node score related to a current lattice node and the probability of the predetermined language model, and the detecting whether the token score is smaller than a predetermined threshold. 6. The method of claim 4 , further comprising: generating a language model database including one or more original language models based on at least information associated with a dictionary database including one or more original dictionaries; in response to a first dictionary being added to the dictionary database, generating a first language model based on at least information associated with the first dictionary; and adding the first language model to the language model database; in response to a second dictionary being deleted from the dictionary database, deleting a second language model corresponding to the second dictionary from the language model database; and in response to a third dictionary being modified, generating a third language model based on at least information associated with the third dictionary; and adding the third language model to the language model database; or modifying a fourth language model corresponding to the third dictionary in the language model database. 7. A device for speech recognition, includes: one or more data processors; and a computer-readable storage medium storing a characteristic-extraction module, a syllable-identification module, a lattice-generation module, and a character-identification module configured to be executed by the one or more data processors; wherein: the characteristic-extraction module configured to extract audio characteristics from acquired voice signals; the syllable-identification module configured to identify a syllable confusion network based on at least information associated with the audio characteristics; the lattice-generation module configured to generate a word lattice based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; and the character-identification module configured to calculating an optimal character sequence in the word lattice as a speech recognition result of the acquired voice signals; wherein the syllable confusion network includes one or more sorted slices, wherein each of the one or more sorted slices includes a set of syllables, and wherein each syllable in the set of syllables is associated with a score; and wherein the lattice-generation module includes: a network-traversal unit configured to traverse candidate characters in the predetermined phonetic dictionary corresponding to the slices in the syllable confusion network; a first generation unit configured to, in response to a first candidate character corresponding to a first syllable in a current slice and a second candidate character corresponding to a second syllabl

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L15/083Primary
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
G10L15/1815
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
G10L15/183
using context dependencies, e.g. language models · CPC title

Patent family

Related publications grouped by family.

View patent family 51882768

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9558741B2 cover?: Systems and methods are provided for speech recognition. For example, audio characteristics are extracted from acquired voice signals; a syllable confusion network is identified based on at least information associated with the audio characteristics; a word lattice is generated based on at least information associated with the syllable confusion network and a predetermined phonetic dictionary; …
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/083. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).