Method and system of speaker recognition using context aware confidence modeling

US10468032B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10468032-B2
Application numberUS-201715483246-A
CountryUS
Kind codeB2
Filing dateApr 10, 2017
Priority dateApr 10, 2017
Publication dateNov 5, 2019
Grant dateNov 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques related to speaker recognition are discussed. Such techniques include determining context aware confidence values formed of false accept and false reject rates determined by using adaptively updated acoustic environment score distributions matched to current score distributions.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of speaker recognition comprising: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker depending on comparison of a threshold determined, at least in part, by using at least one of the context aware confidence indicators and the at least one current speaker score; and using the decision to adaptively update at least the speaker score distribution associated with the decision. 2. The method of claim 1 comprising using the false accept rate or false reject rate or both to determine a threshold to compare to the at least one current speaker score to decide whether a speaker of the audio input is an imposter or a true speaker. 3. The method of claim 1 wherein the generating comprises using a cumulative density function (CDF) that uses the context parameters and the at least one current speaker score. 4. The method of claim 3 wherein the context parameters comprise the mean and standard deviation of the context score distribution associated with the pre-stored context audio data matched to the data of the audio input. 5. The method of claim 1 wherein individual acoustic environment contexts comprise at least different speech-to-noise ratios (SNRs). 6. The method of claim 1 wherein the acoustic environment contexts each indicate at least one of: a location of the speaker; a location of the speaker comprising at least one of cafeteria noise and noise from inside a vehicle; an emotional state of the speaker; health of the speaker; a gender of the speaker; an age category of the speaker; any one or more of the above at an SNR level. 7. The method of claim 1 wherein at least one of the acoustic environment contexts is associated with at least one reverberation component of the audio input. 8. The method of claim 1 wherein using the decision comprises performing at least a secondary identification to determine the ground truth of the decision. 9. The method of claim 8 wherein the secondary identification comprises at least one of: at least one statement in response to a request for the statement given to a speaker of the audio input; face detection; person detection comprising visual detection of one or more body parts instead of, or in addition to, a face; skin print(s) comprising finger print(s); retinal scan(s), and receiving at least one password. 10. A system for performing speaker recognition comprising: a memory configured to store a received audio input; and a digital signal processor coupled to the memory and to operate by: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker; and using the decision to adaptively update at least the context score distribution associated with the decision. 11. The system of claim 10 wherein the digital signal processor is to operate by using the false accept rate or false reject rate or both to determine a threshold to compare to the at least one current speaker score to decide whether a speaker of the audio input is an imposter or a true speaker. 12. The system of claim 10 wherein the generating comprises using a cumulative density function (CDF) that uses the context parameters and the at least one current speaker score. 13. The system of claim 10 , wherein at least one score point is added to the context score distribution used to determine the decision to form an updated context score distribution. 14. The system of claim 13 , wherein the context parameters of the updated context score distribution are determined and stored in association with the updated score distribution to form further confidence indicators. 15. The system of claim 10 , wherein the context score distributions are updated after a certain time period. 16. The system of claim 10 , wherein a context score distribution is updated after a certain minimum number of decisions. 17. The system of claim 10 , wherein the digital signal processor is to operate by: adding new context score distributions determined by using a threshold during context prediction. 18. The system of claim 10 wherein context scores used to update context score distributions are dropped from the context database after a minimum number of decisions or after a certain time period. 19. The system of claim 10 wherein the plurality of context score distributions comprises context score distributions that indicate an imposter. 20. The system of claim 10 wherein the plurality of context score distributions are stored in a database wherein individual context score distributions are stored with associated context type indicator, and decision type indicating either imposter or true speaker. 21. The system of claim 10 wherein the digital signal processor is to operate by: confirming the ground truth of the decisions; and if confirmed, saving the speaker score, the decision result, and the identification of the context for future updating of the context database when an updating criteria is met. 22. At least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to operate by: determining at least one current speaker score based on received audio input; predicting the context of the audio input comprising finding a match between data of the audio input and pre-stored context audio data associated with an acoustic environment context of a plurality of acoustic environment contexts, wherein the individual contexts are associated with a speaker score distribution and context parameters characterizing the speaker score distribution; generating at least one context aware confidence indicator comprising a false accept rate-related value or a false reject rate-related value or both based, at least in part, on the current speaker score and the context parameters; forming a decision as to whether a speaker of the audio input is an imposter or a true speaker; and using the decision to adaptively update at least the context score distribution associated with the decisi

Assignees

Inventors

Classifications

  • G10L17/12Primary

    Score normalisation · CPC title

  • for comparison or discrimination · CPC title

  • Training, enrolment or model building · CPC title

  • Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10468032B2 cover?
Techniques related to speaker recognition are discussed. Such techniques include determining context aware confidence values formed of false accept and false reject rates determined by using adaptively updated acoustic environment score distributions matched to current score distributions.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L17/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).