Method, apparatus and computer program product for providing compound models for speech recognition adaptation

US9418662B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9418662-B2
Application numberUS-35681409-A
CountryUS
Kind codeB2
Filing dateJan 21, 2009
Priority dateJan 21, 2009
Publication dateAug 16, 2016
Grant dateAug 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus for providing compound models for speech recognition adaptation. The apparatus may include processor and memory including computer program code with the memory, the computer program code being configured, with the processor, to cause the apparatus to at least receive a speech signal corresponding to a particular speaker. The apparatus may further be configured to select a cluster model including both a speaker independent portion and a speaker dependent portion based at least in part on a characteristic of speech of the particular speaker. The apparatus may be further configured to process the speech using the selected cluster model. The apparatus may be further configured to cause at least a speaker dependent portion of one or more non-selected cluster models to be stored remotely. A corresponding method and computer program product are also provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, via a microphone, a speech signal corresponding to a particular speaker; selecting, via a processor, a cluster model to be stored in at least one memory device, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; storing the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and processing the speech signal, using the selected cluster model, to convert the speech signal into text. 2. The method of claim 1 , wherein selecting the cluster model comprises performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 3. The method of claim 1 , wherein selecting the cluster model comprises selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 4. The method of claim 3 , wherein selecting the cluster model comprises forming a compound cluster model by utilizing the selected speaker dependent portion and a speaker independent state network defining the speaker independent portion that is shared among a plurality of speaker dependent portions. 5. The method of claim 1 , wherein selecting the cluster model comprises selecting the speaker dependent portion of the cluster model based on speaker characteristics indicative of gender, accent, age or language. 6. A computer program product comprising at least one computer-readable non-transitory storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions comprising: program code instructions for receiving, via a microphone, a speech signal corresponding to a particular speaker; program code instructions for selecting a cluster model to be stored in at least one memory device, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; program code instructions for storing the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and program code instructions for processing the speech signal using the selected cluster model, to convert the speech signal into text. 7. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 8. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 9. The computer program product of claim 8 , wherein program code instructions for selecting the cluster model include instructions for forming a compound cluster model by utilizing the selected speaker dependent portion and a speaker independent state network defining the speaker independent portion that is shared among a plurality of speaker dependent portions. 10. The computer program product of claim 6 , wherein program code instructions for selecting the cluster model include instructions for selecting the speaker dependent portion of the cluster model based on speaker characteristics indicative of gender, accent, age or language. 11. An apparatus comprising: a processor; and a memory including computer program code, the memory and the computer program code configured to, with the processor, cause the apparatus to at least: receive, via a microphone, a speech signal corresponding to a particular speaker; select a cluster model to be stored in the memory, the cluster model comprises both a speaker independent portion that defines a plurality of states and a plurality of state tyings and a speaker dependent portion, wherein the speaker independent portion is selected based at least in part on a recognition result of a recognition operation and wherein the speaker dependent portion is a subspace Hidden Markov Model, wherein the subspace Hidden Markov Model is selected based at least in part on a characteristic of speech of the particular speaker, the microphone, and on a rescoring of the recognition result; store the speaker dependent portion of the selected cluster model locally and storing different speaker dependent portions remotely; and process the speech signal using the selected cluster model, to convert the speech signal into text. 12. The apparatus of claim 11 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by performing a recognition operation with respect to the particular speaker for each of a plurality of cluster models and selecting one of the cluster models based on a likelihood score for the selected cluster model indicative of a degree of matching between the particular speaker and the selected cluster model. 13. The apparatus of claim 11 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by selecting the speaker dependent portion among a plurality of different speaker dependent portions in which each speaker dependent portion is associated with a corresponding speaker characteristic based on a comparison of the corresponding speaker characteristic of each speaker dependent portion to the characteristic of speech of the particular speaker. 14. The apparatus of claim 13 , wherein the memory including the computer program code is further configured to, with the processor, cause the apparatus to select the cluster model by forming a compound cluster model by utilizing the selected speaker dependent portion and a

Assignees

Inventors

Classifications

  • G10L17/20Primary

    Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

  • to the speaker · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9418662B2 cover?
An apparatus for providing compound models for speech recognition adaptation. The apparatus may include processor and memory including computer program code with the memory, the computer program code being configured, with the processor, to cause the apparatus to at least receive a speech signal corresponding to a particular speaker. The apparatus may further be configured to select a cluster m…
Who is the assignee on this patent?
Olsen Jesper, Nokia Technologies Oy
What technology area does this patent fall under?
Primary CPC classification G10L17/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).