Speaker verification
US-2015301796-A1 · Oct 22, 2015 · US
US9997161B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9997161-B2 |
| Application number | US-201514852083-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 11, 2015 |
| Priority date | Sep 11, 2015 |
| Publication date | Jun 12, 2018 |
| Grant date | Jun 12, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate. Normalizing of speech recognition CC scores to map to the same or better CA and/or FA profiles at the previously-set operating thresholds allows preset operating thresholds to remain valid and accurate, even after a speech recognition engine, acoustic model, and/or other parameters are changed.
Opening claim text (preview).
What is claimed is: 1. A speech recognition device for accurate transformation of acoustic utterances into text, the speech recognition device comprising: an acoustic sensor configured to receive one or more acoustic utterances; one or more memory devices configured to receive and store a set of one or more acoustic models having trained one or more confidence classifiers and to store one or more acceptance metrics defining at least one recognition acceptance condition; automatic speech recognition circuitry including at least one processor unit for executing confidence classifier circuitry, the confidence classifier circuitry being configured to generate a first speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and recognized text based on a first confidence classifier and to generate a second speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and the recognized text based on a second confidence classifier; normalization circuitry connected to the automatic speech recognition circuitry to receive the first and second speech recognition confidence classifier scores from the confidence classifier circuitry and to map a distribution within an output range of the second confidence classifier to a distribution within an output range of the first confidence classifier, the mapped distribution including a mapped speech recognition confidence classifier score for the second confidence classifier that more accurately satisfies the recognition acceptance condition than a corresponding score from the first confidence classifier; and a text output interface connected to receive new recognized text from the automatic speech recognition circuitry for a newly-received acoustic utterance and to output a signal representing the new recognized text as accepted text responsive to a determination that a mapped speech recognition confidence classifier score of the second confidence classifier for the newly-received acoustic utterance satisfies the recognition acceptance condition. 2. The speech recognition device of claim 1 wherein the normalization circuitry executes a histogram-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 3. The speech recognition device of claim 2 wherein the normalization circuitry executes the histogram-based mapping by generating probability mass functions for confidence scores from the first and second confidence classifiers, generating a cumulative mass functions corresponding to the probability mass functions for confidence scores from the first and second confidence classifiers, respectively, and generating an acceptance criteria map in which the cumulative mass function for the second classifier for each confidence score in the acceptance criteria map equals the cumulative mass function for the first classifier for each confidence score within a preset resolution. 4. The speech recognition device of claim 1 wherein the normalization circuitry executes a polynomial-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 5. The speech recognition device of claim 4 wherein the normalization circuitry executes the polynomial-based mapping by collecting a set of acceptance metrics from the first confidence classifier and a set of acceptance metrics from the second confidence classifier, sampling the sets of acceptance metrics at a specified sampling interval to obtain a sampled set of confidence threshold for the first confidence classifier and a sampled set of confidence thresholds for the first confidence classifier, and learning a polynomial that represents a set of confidence thresholds for the first and second confidence classifiers with a preset resolution. 6. The speech recognition device of claim 1 wherein the normalization circuitry executes a tan h-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 7. The speech recognition device of claim 6 wherein the normalization circuitry executes the tan h-based mapping by collecting a set of confidence scores representing acceptance metrics from the first confidence classifier and a set of confidence scores representing acceptance metrics from the second confidence classifier, learning a bias parameter and a scale parameter such that a tan h of the confidence scores representing acceptance metrics from the first confidence classifier equals the bias parameter plus a product of the scale parameter and a tan h of the confidence scores representing acceptance metrics from the first confidence classifier. 8. The speech recognition device of claim 1 wherein the text output interface outputs the signal representing the accepted text to a display. 9. A method of transforming acoustic utterances into text in a speech recognition device, the method comprising: receiving one or more acoustic utterances via an acoustic sensor configured of the speech recognition device; storing a set of one or more acoustic models having trained one or more confidence classifiers and one or more acceptance metrics defining at least one recognition acceptance condition; generating a first speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and recognized text based on a first confidence classifier; generating a second speech recognition confidence classifier score corresponding to the one or more received acoustic utterances and the recognized text based on a second confidence classifier; mapping a distribution within an output range of the second confidence classifier to a distribution within an output range of the first confidence classifier, the mapped distribution including a mapped speech recognition confidence classifier score for the second confidence classifier that more accurately satisfies the recognition acceptance condition than a corresponding score from the first confidence classifier; and outputting a signal representing new recognized text for a newly-received acoustic utterance as accepted text responsive to a determination that a mapped speech recognition confidence classifier score of the second confidence classifier for the newly-received acoustic utterance satisfies the recognition acceptance condition. 10. The method of claim 9 wherein the mapping operation comprises: histogram-based mapping generating the mapped speech recognition confidence classifier score that equally or more accurately satisfies the recognition acceptance condition than the first speech recognition confidence classifier score. 11. The method of claim 9 wherein the histogram-mapping operation comprises: generating probability mass functions for confidence scores from the first and second confidence classifiers, generating a cumulative mass functions corresponding to the probability mass functions for confidence scores from the first and second confidence classifiers, respectively, and generating an acceptance criteria map in which the cumulative mass function for the second classifier for each confidence score in the acceptance criteria map equals the cumulative mass function for the first classifier for each confidence score within a preset resolution. 12. The method of
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
using distance or distortion measures between unknown speech and reference templates · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.