Generating dialogue based on verification scores
US-2019027152-A1 · Jan 24, 2019 · US
US10468032B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10468032-B2 |
| Application number | US-201715483246-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 10, 2017 |
| Priority date | Apr 10, 2017 |
| Publication date | Nov 5, 2019 |
| Grant date | Nov 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques related to speaker recognition are discussed. Such techniques include determining context aware confidence values formed of false accept and false reject rates determined by using adaptively updated acoustic environment score distributions matched to current score distributions.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of speaker recognition comprising: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker depending on comparison of a threshold determined, at least in part, by using at least one of the context aware confidence indicators and the at least one current speaker score; and using the decision to adaptively update at least the speaker score distribution associated with the decision. 2. The method of claim 1 comprising using the false accept rate or false reject rate or both to determine a threshold to compare to the at least one current speaker score to decide whether a speaker of the audio input is an imposter or a true speaker. 3. The method of claim 1 wherein the generating comprises using a cumulative density function (CDF) that uses the context parameters and the at least one current speaker score. 4. The method of claim 3 wherein the context parameters comprise the mean and standard deviation of the context score distribution associated with the pre-stored context audio data matched to the data of the audio input. 5. The method of claim 1 wherein individual acoustic environment contexts comprise at least different speech-to-noise ratios (SNRs). 6. The method of claim 1 wherein the acoustic environment contexts each indicate at least one of: a location of the speaker; a location of the speaker comprising at least one of cafeteria noise and noise from inside a vehicle; an emotional state of the speaker; health of the speaker; a gender of the speaker; an age category of the speaker; any one or more of the above at an SNR level. 7. The method of claim 1 wherein at least one of the acoustic environment contexts is associated with at least one reverberation component of the audio input. 8. The method of claim 1 wherein using the decision comprises performing at least a secondary identification to determine the ground truth of the decision. 9. The method of claim 8 wherein the secondary identification comprises at least one of: at least one statement in response to a request for the statement given to a speaker of the audio input; face detection; person detection comprising visual detection of one or more body parts instead of, or in addition to, a face; skin print(s) comprising finger print(s); retinal scan(s), and receiving at least one password. 10. A system for performing speaker recognition comprising: a memory configured to store a received audio input; and a digital signal processor coupled to the memory and to operate by: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker; and using the decision to adaptively update at least the context score distribution associated with the decision. 11. The system of claim 10 wherein the digital signal processor is to operate by using the false accept rate or false reject rate or both to determine a threshold to compare to the at least one current speaker score to decide whether a speaker of the audio input is an imposter or a true speaker. 12. The system of claim 10 wherein the generating comprises using a cumulative density function (CDF) that uses the context parameters and the at least one current speaker score. 13. The system of claim 10 , wherein at least one score point is added to the context score distribution used to determine the decision to form an updated context score distribution. 14. The system of claim 13 , wherein the context parameters of the updated context score distribution are determined and stored in association with the updated score distribution to form further confidence indicators. 15. The system of claim 10 , wherein the context score distributions are updated after a certain time period. 16. The system of claim 10 , wherein a context score distribution is updated after a certain minimum number of decisions. 17. The system of claim 10 , wherein the digital signal processor is to operate by: adding new context score distributions determined by using a threshold during context prediction. 18. The system of claim 10 wherein context scores used to update context score distributions are dropped from the context database after a minimum number of decisions or after a certain time period. 19. The system of claim 10 wherein the plurality of context score distributions comprises context score distributions that indicate an imposter. 20. The system of claim 10 wherein the plurality of context score distributions are stored in a database wherein individual context score distributions are stored with associated context type indicator, and decision type indicating either imposter or true speaker. 21. The system of claim 10 wherein the digital signal processor is to operate by: confirming the ground truth of the decisions; and if confirmed, saving the speaker score, the decision result, and the identification of the context for future updating of the context database when an updating criteria is met. 22. At least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to operate by: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker; and using the decision to adaptively update at least the context score distribution associated with the decisi
Score normalisation · CPC title
for comparison or discrimination · CPC title
Training, enrolment or model building · CPC title
Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.