Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition

US9406299B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9406299-B2
Application numberUS-201214399867-A
CountryUS
Kind codeB2
Filing dateMar 8, 2012
Priority dateMay 8, 2012
Publication dateAug 2, 2016
Grant dateAug 2, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition acoustic model and the speaker adapted acoustic model. In addition, an approach is also disclosed to estimate speaker-specific feature or model transforms over multiple sessions. This is achieved by updating the previously estimated transform using only adaptation statistics of the current session.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: adapting, by a computing device, an initial speech recognition acoustic model to produce a speaker adapted acoustic model using speech recognition data from a particular speaker; developing, by the computing device, a speaker differential acoustic model representing one or more differences between the initial speech recognition acoustic model and the speaker adapted acoustic model; optimizing different quantization ranges for parameter subsets of the speaker adapted acoustic model using a scoring function; using the different quantization ranges to minimize an acoustic model difference measure of the speaker adapted acoustic model against the initial speech recognition acoustic model; and storing the speaker differential acoustic model for subsequent speech recognition with the particular speaker. 2. The method of claim 1 , comprising: retrieving the stored speaker differential acoustic model; and processing the speaker differential acoustic model and the initial speech recognition acoustic model to derive the speaker adapted acoustic model. 3. The method of claim 2 , comprising: using the derived speaker adapted acoustic model to perform speech recognition for the particular speaker. 4. The method of claim 1 , comprising: ranking the one or more differences based on stored sizes of the one or more differences. 5. The method of claim 1 , comprising: ranking the one or more differences based on most important differences of the one or more differences. 6. The method of claim 1 , comprising: ranking the one or more differences using the scoring function; selecting a subset of the one or more differences, the subset selected based on the ranking the one or more differences; determining storage-efficient Gaussian component clustering of the speaker differential acoustic model using the scoring function; and using the subset of the one or more differences and the storage-efficient Gaussian component clustering to minimize the acoustic model difference measure of the speaker adapted acoustic model against the initial speech recognition acoustic model. 7. The method of claim 3 , wherein using the derived speaker adapted acoustic model to perform the speech recognition comprises: producing a representative recognition output of the speech recognition data from the particular speaker, wherein the representative recognition output is constrained by a recognition language model and search algorithm; and comparing the representative recognition output to the derived speaker adapted acoustic model. 8. The method of claim 1 , comprising: determining a session update transform for a linear transform-based speaker adaptation of the speaker differential acoustic model based on speech recognition data from speech utterances by the particular speaker, wherein the linear transform-based speaker adaptation uses relevance smoothing of the speech recognition data; and producing an updated speaker differential acoustic model by combining the speaker differential acoustic model and the session update transform. 9. A method comprising: loading a user profile for a particular speaker including an initial user feature transform representing speaker adapted speech recognition acoustic models; performing speech recognition, by a computing device, for a session of speech utterances from the particular speaker using the initial user feature transform and a plurality of speaker independent speech recognition acoustic models; determining a session update transform for a linear transform-based speaker adaptation of the initial user feature transform based on speech recognition data from the session; producing an updated user feature transform by combining the initial user feature transform and the session update transform; and storing in the user profile the updated user feature transform for subsequent speech recognition with the particular speaker. 10. The method of claim 9 , wherein producing the updated user feature transform by combining the initial user feature transform and the session update transform is performed without a bias vector. 11. The method of claim 9 , wherein producing the updated user feature transform by combining the initial user feature transform and the session update transform is performed with a bias vector. 12. The method of claim 11 , wherein the bias vector is an acoustic observation vector for constrained maximum likelihood linear regression (CMLLR) adaptation. 13. The method of claim 11 , wherein the bias vector is a component mean vector for maximum likelihood linear regression (MLLR) adaptation. 14. The method of claim 9 , wherein the linear transform-based speaker adaptation uses relevance smoothing of the speech recognition data from the session. 15. A non-transitory computer-readable medium storing computer-readable instructions that, when executed by a processor, cause a device to: load a user profile for a particular speaker including an initial user feature transform representing speaker adapted speech recognition acoustic models; perform speech recognition for a session of speech utterances from the particular speaker using the initial user feature transform and a plurality of speaker independent speech recognition acoustic models; determine a session update transform for a linear transform-based speaker adaptation of the initial user feature transform based on speech recognition data from the session; produce an updated user feature transform by combining the initial user feature transform and the session update transform; and store in the user profile the updated user feature transform for subsequent speech recognition with the particular speaker. 16. The non-transitory computer-readable medium of claim 15 , wherein producing the updated user feature transform by combining the initial user feature transform and the session update transform is performed without a bias vector. 17. The non-transitory computer-readable medium of claim 16 , wherein producing the updated user feature transform by combining the initial user feature transform and the session update transform is performed with a bias vector. 18. The non-transitory computer-readable medium of claim 17 , wherein the bias vector is an acoustic observation vector for constrained maximum likelihood linear regression (CMLLR) adaptation. 19. The non-transitory computer-readable medium of claim 17 , wherein the bias vector is a component mean vector for maximum likelihood linear regression (MLLR) adaptation. 20. The non-transitory computer-readable medium of claim 15 , wherein the linear transform-based speaker adaptation uses relevance smoothing of the speech recognition data from the session.

Assignees

Inventors

Classifications

  • of the speaker; Human-factor methodology · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • to the speaker · CPC title

  • G10L15/285Primary

    Memory allocation or algorithm optimisation to reduce hardware requirements · CPC title

  • G10L17/06Primary

    Decision making techniques; Pattern matching strategies · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9406299B2 cover?
A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition aco…
Who is the assignee on this patent?
Gollan Christian, Willett Daniel, Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).