Mimicking user speech patterns
US-9129602-B1 · Sep 8, 2015 · US
US9336781B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9336781-B2 |
| Application number | US-201414264916-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2014 |
| Priority date | Oct 17, 2013 |
| Publication date | May 10, 2016 |
| Grant date | May 10, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition.
Opening claim text (preview).
The invention claimed is: 1. A text-independent speaker recognition system comprising: a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; and wherein the front end module is to align the phonetic content of the speech segment in lexical units, and the back end module is to compute a distance between at least one of the lexical units of the phonetic content with a similar lexical unit of the stored speaker model. 2. A text-independent speaker recognition system comprising: a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; and wherein the front end module is to align the phonetic content of the speech segment in tri-phones, and the back end module is to compute a distance between at least one of the tri-phones of the phonetic content with a similar tri-phone of the stored speaker model. 3. A text-independent speaker recognition system comprising: a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; wherein the front end module is to align the phonetic content of the speech segment in tri-phones, and the back end module is to compute a distance between at least one of the tri-phones of the phonetic content with a similar tri-phone of the stored speaker model; and wherein the back end module is to disregard tri-phones of the speech segment that do not have similar tri-phones in the stored speaker model. 4. A front end module for a text-independent speaker recognition system, the front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device comprising a plurality of instructions embodied in one or more computer accessible storage media and executable by a processor to: process an audio signal comprising a sample of natural language speech; identify a plurality of temporal speech segments in the natural language speech sample; assign a phonetic unit of a plurality of different phonetic units to each of the speech segments of the speech sample, wherein each of the phonetic units is associated with a class of phonetic content of a plurality of classes of phonetic content; and mathematically determine speaker-specific information about the pronunciation of the speech segments in comparison to the pronunciation of the speech segments by a general population; wherein the front end module is to apply a partial speech recognition system comprising a hidden Markov model and a deep neural network to associate different speech segments with different phonetic units; and wherein the front end module is to determine the phonetic unit to associate with a current speech segment based on a previously-determined association of a phonetic unit with another speech segment that is temporally adjacent the current speech segment in the natural language speech sample. 5. A front end module for a text-independent speaker recognition system, the front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device comprising a plurality of instructions embodied in one or more computer accessible storage media and executable by a processor to: process an audio signal comprising a sample of natural language speech; identify a plurality of temporal speech segments in the natural language speech sample; assign a phonetic unit of a plurality of different phonetic units to each of the speech segments of the speech sample, wherein each of the phonetic units is associated with a class of phonetic content of a plurality of classes of phonetic content; and mathematically determine speaker-specific information about the pronunciation of the speech segments in comparison to the pronunciation of the speech segments by a general population; wherein the front end module is to compute a plurality of statistics to determine the speaker-specific information, and the front end module is to compute the plurality of statistics using posterior probabilities of the phonetic classes. 6. A method for text-independent speaker recognition, the method comprising, with code embodied in one or more non-transitory computer readable media and executable by at least one computing device: processing an audio signal comprising a current sample of natural language speech and speaker-specific information about the speaker of the current speech sample; executing a speech recognizer on the current speech sample to: identify a speech segment in the current speech sample; and create a phonetic representation of the speech s
Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.