Automatic speech recognition confidence classifier

US9997161B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9997161-B2
Application numberUS-201514852083-A
CountryUS
Kind codeB2
Filing dateSep 11, 2015
Priority dateSep 11, 2015
Publication dateJun 12, 2018
Grant dateJun 12, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate. Normalizing of speech recognition CC scores to map to the same or better CA and/or FA profiles at the previously-set operating thresholds allows preset operating thresholds to remain valid and accurate, even after a speech recognition engine, acoustic model, and/or other parameters are changed.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition device for accurate transformation of acoustic utterances into text, the speech recognition device comprising: an acoustic sensor configured to receive one or more acoustic utterances; one or more memory devices configured to receive and store a set of one or more acoustic models having trained one or more confidence classifiers and to store one or more acceptance metrics defining at least one recognition acceptance condition; automatic speech recognition circuitry including at least one processor unit for executing confidence classifier circuitry, the confidence classifier circuitry being configured to generate a first speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and recognized text based on a first confidence classifier and to generate a second speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and the recognized text based on a second confidence classifier; normalization circuitry connected to the automatic speech recognition circuitry to receive the first and second speech recognition confidence classifier scores from the confidence classifier circuitry and to map a distribution within an output range of the second confidence classifier to a distribution within an output range of the first confidence classifier, the mapped distribution including a mapped speech recognition confidence classifier score for the second confidence classifier that more accurately satisfies the recognition acceptance condition than a corresponding score from the first confidence classifier; and a text output interface connected to receive new recognized text from the automatic speech recognition circuitry for a newly-received acoustic utterance and to output a signal representing the new recognized text as accepted text responsive to a determination that a mapped speech recognition confidence classifier score of the second confidence classifier for the newly-received acoustic utterance satisfies the recognition acceptance condition. 2. The speech recognition device of claim 1 wherein the normalization circuitry executes a histogram-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 3. The speech recognition device of claim 2 wherein the normalization circuitry executes the histogram-based mapping by generating probability mass functions for confidence scores from the first and second confidence classifiers, generating a cumulative mass functions corresponding to the probability mass functions for confidence scores from the first and second confidence classifiers, respectively, and generating an acceptance criteria map in which the cumulative mass function for the second classifier for each confidence score in the acceptance criteria map equals the cumulative mass function for the first classifier for each confidence score within a preset resolution. 4. The speech recognition device of claim 1 wherein the normalization circuitry executes a polynomial-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 5. The speech recognition device of claim 4 wherein the normalization circuitry executes the polynomial-based mapping by collecting a set of acceptance metrics from the first confidence classifier and a set of acceptance metrics from the second confidence classifier, sampling the sets of acceptance metrics at a specified sampling interval to obtain a sampled set of confidence threshold for the first confidence classifier and a sampled set of confidence thresholds for the first confidence classifier, and learning a polynomial that represents a set of confidence thresholds for the first and second confidence classifiers with a preset resolution. 6. The speech recognition device of claim 1 wherein the normalization circuitry executes a tan h-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 7. The speech recognition device of claim 6 wherein the normalization circuitry executes the tan h-based mapping by collecting a set of confidence scores representing acceptance metrics from the first confidence classifier and a set of confidence scores representing acceptance metrics from the second confidence classifier, learning a bias parameter and a scale parameter such that a tan h of the confidence scores representing acceptance metrics from the first confidence classifier equals the bias parameter plus a product of the scale parameter and a tan h of the confidence scores representing acceptance metrics from the first confidence classifier. 8. The speech recognition device of claim 1 wherein the text output interface outputs the signal representing the accepted text to a display. 9. A method of transforming acoustic utterances into text in a speech recognition device, the method comprising: receiving one or more acoustic utterances via an acoustic sensor configured of the speech recognition device; storing a set of one or more acoustic models having trained one or more confidence classifiers and one or more acceptance metrics defining at least one recognition acceptance condition; generating a first speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and recognized text based on a first confidence classifier; generating a second speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and the recognized text based on a second confidence classifier; mapping a distribution within an output range of the second confidence classifier to a distribution within an output range of the first confidence classifier, the mapped distribution including a mapped speech recognition confidence classifier score for the second confidence classifier that more accurately satisfies the recognition acceptance condition than a corresponding score from the first confidence classifier; and outputting a signal representing new recognized text for a newly-received acoustic utterance as accepted text responsive to a determination that a mapped speech recognition confidence classifier score of the second confidence classifier for the newly-received acoustic utterance satisfies the recognition acceptance condition. 10. The method of claim 9 wherein the mapping operation comprises: histogram-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 11. The method of claim 9 wherein the histogram-mapping operation comprises: generating probability mass functions for confidence scores from the first and second confidence classifiers, generating a cumulative mass functions corresponding to the probability mass functions for confidence scores from the first and second confidence classifiers, respectively, and generating an acceptance criteria map in which the cumulative mass function for the second classifier for each confidence score in the acceptance criteria map equals the cumulative mass function for the first classifier for each confidence score within a preset resolution. 12. The method of

Assignees

Inventors

Classifications

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • using distance or distortion measures between unknown speech and reference templates · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9997161B2 cover?
The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 12 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).