Authentication of impaired voices
US-2024194195-A1 · Jun 13, 2024 · US
US9418662B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9418662-B2 |
| Application number | US-35681409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2009 |
| Priority date | Jan 21, 2009 |
| Publication date | Aug 16, 2016 |
| Grant date | Aug 16, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for providing compound models for speech recognition adaptation. The apparatus may include processor and memory including computer program code with the memory, the computer program code being configured, with the processor, to cause the apparatus to at least receive a speech signal corresponding to a particular speaker. The apparatus may further be configured to select a cluster model including both a speaker independent portion and a speaker dependent portion based at least in part on a characteristic of speech of the particular speaker. The apparatus may be further configured to process the speech using the selected cluster model. The apparatus may be further configured to cause at least a speaker dependent portion of one or more non-selected cluster models to be stored remotely. A corresponding method and computer program product are also provided.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, via a microphone, a speech signal corresponding to a particular speaker; selecting, via a processor, a cluster model to be stored in at least one memory device, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; storing the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and processing the speech signal, using the selected cluster model, to convert the speech signal into text. 2. The method of claim 1 , wherein selecting the cluster model comprises performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 3. The method of claim 1 , wherein selecting the cluster model comprises selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 4. The method of claim 3 , wherein selecting the cluster model comprises forming a compound cluster model by utilizing the selected speaker dependent portion and a speaker independent state network defining the speaker independent portion that is shared among a plurality of speaker dependent portions. 5. The method of claim 1 , wherein selecting the cluster model comprises selecting the speaker dependent portion of the cluster model based on speaker characteristics indicative of gender, accent, age or language. 6. A computer program product comprising at least one computer-readable non-transitory storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions comprising: program code instructions for receiving, via a microphone, a speech signal corresponding to a particular speaker; program code instructions for selecting a cluster model to be stored in at least one memory device, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; program code instructions for storing the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and program code instructions for processing the speech signal using the selected cluster model, to convert the speech signal into text. 7. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 8. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 9. The computer program product of claim 8 , wherein program code instructions for selecting the cluster model include instructions for forming a compound cluster model by utilizing the selected speaker dependent portion and a speaker independent state network defining the speaker independent portion that is shared among a plurality of speaker dependent portions. 10. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for selecting the speaker dependent portion of the cluster model based on speaker characteristics indicative of gender, accent, age or language. 11. An apparatus comprising: a processor; and a memory including computer program code, the memory and the computer program code configured to, with the processor, cause the apparatus to at least: receive, via a microphone, a speech signal corresponding to a particular speaker; select a cluster model to be stored in the memory, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; store the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and process the speech signal using the selected cluster model, to convert the speech signal into text. 12. The apparatus of claim 11 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 13. The apparatus of claim 11 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 14. The apparatus of claim 13 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by forming a compound cluster model by utilizing the selected speaker dependent portion and a
Related publications grouped by family.
Answers are generated from the same data shown on this page.