Reverberation compensation for far-field speaker recognition

US10096321B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10096321-B2
Application numberUS-201615242882-A
CountryUS
Kind codeB2
Filing dateAug 22, 2016
Priority dateAug 22, 2016
Publication dateOct 9, 2018
Grant dateOct 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for reverberation compensation for far-field speaker recognition. A methodology implementing the techniques according to an embodiment includes receiving an authentication audio signal associated with speech of a user and extracting features from the authentication audio signal. The method also includes scoring results of application of one or more speaker models to the extracted features. Each of the speaker models is trained based on a training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with that speaker model. The method further includes selecting one of the speaker models, based on the score, and mapping the selected speaker model to a known speaker identification or label that is associated with the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented method for speaker recognition, the method comprising: receiving, by a processor, an authentication audio signal associated with speech of a user; extracting, by the processor, features from the authentication audio signal; scoring, by the processor, results of application of one or more speaker models to the extracted features, wherein each of the speaker models is trained based on a training audio signal, the training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with the speaker model; selecting, by the processor, one of the speaker models, the selected speaker model associated with the highest of the scores; and recognizing, by the processor, an identity of the user based on a known speaker identification (ID) associated with the selected speaker model, the recognized identity for use to authenticate the user. 2. The method of claim 1 , wherein the training of the speaker models further comprises: capturing a plurality of the training audio signals from a plurality of users; receiving a speaker ID for each of the users; and processing each of the plurality of training audio signals by the reverberation simulator to generate a plurality of reverberation processed training audio signals for each of the training audio signals, wherein each of the reverberation processed training audio signals is associated with a unique far-field environmental effect. 3. The method of claim 2 , wherein the training of the speaker models further comprises: generating feature sets of extracted features from each of the training audio signals and from each of the reverberation processed training audio signals; generating speaker models based on each feature set; and assigning the speaker ID as the known speaker ID associated with the generated speaker model. 4. The method of claim 1 , wherein the authentication audio signal is captured in a far-field of the microphone and the training audio signal is captured in a near-field of the microphone. 5. The method of claim 4 , wherein the far-field is a distance greater than three feet from the microphone and the near-field is a distance closer than three feet from the microphone. 6. A processor-implemented method for configuring a reverberation simulator for speaker recognition, the method comprising: receiving, by a processor, a first audio signal associated with speech of a user, the first audio signal captured at a first distance from a microphone; selecting, by the processor, a trial set of parameters for a reverberation simulator; generating, by the processor, a speaker model based on extracted features of an application of the reverberation simulator to the first audio signal; receiving, by the processor, one or more additional audio signals associated with speech of the user, the additional audio signals captured at a second distance from the microphone, the second distance greater than the first distance; scoring, by the processor, results of application of the speaker model to extracted features of each of the additional audio signals; and associating, by the processor, a summation of the scores with the trial set of parameters the summation of the scores to indicate a relative effectiveness of the trial set of parameters for modeling a far-field environment of the microphone at the second distance. 7. The method of claim 6 , further comprising selecting the trial set of parameters as an operational set of parameters based on the summation of scores associated with the trial set of parameters. 8. The method of claim 6 , further comprising generating an updated trial set of parameters for the reverberation simulator using an optimization algorithm based on the summation of scores. 9. The method of claim 8 , wherein the optimization algorithm is one of a genetic algorithm or a gradient descent algorithm. 10. The method of claim 6 , wherein the reverberation simulator is a Schroeder reverberator and the reverberation parameters comprise one or more of an effect mix parameter, a room size parameter, a damping parameter, and a stereo width parameter. 11. The method of claim 6 , wherein the second distance is in the far-field of the microphone and the first distance is in the near-field of the microphone. 12. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for speaker recognition, the operations comprising: receiving an authentication audio signal associated with speech of a user; extracting features from the authentication audio signal; scoring results of application of one or more speaker models to the extracted features, wherein each of the speaker models is trained based on a training audio signal, the training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with the speaker model; selecting one of the speaker models, the selected speaker model associated with the highest of the scores; and recognizing an identity of the user based on a known speaker identification (ID) associated with the selected speaker model, the recognized identity for use to authenticate the user. 13. The computer readable storage medium of claim 12 , wherein the training of the speaker models further comprises the operations: capturing a plurality of the training audio signals from a plurality of users; receiving a speaker ID for each of the users; and processing each of the plurality of training audio signals by the reverberation simulator to generate a plurality of reverberation processed training audio signals for each of the training audio signals, wherein each of the reverberation processed training audio signals is associated with a unique far-field environmental effect. 14. The computer readable storage medium of claim 13 , wherein the training of the speaker models further comprises the operations: generating feature sets of extracted features from each of the training audio signals and from each of the reverberation processed training audio signals; generating speaker models based on each feature set; and assigning the speaker ID as the known speaker ID associated with the generated speaker model. 15. The computer readable storage medium of claim 12 , wherein the authentication audio signal is captured in a far-field of the microphone and the training audio signal is captured in a near-field of the microphone. 16. The computer readable storage medium of claim 15 , wherein the far-field is a distance greater than three feet from the microphone and the near-field is a distance closer than three feet from the microphone. 17. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for configuring a reverberation simulator for speaker recognition, the operations comprising: receiving a first audio signal associated with speech of a user, the first audio signal captured at a first distance from a microphone; selecting a trial set of parameters for a reverberation simulator; generating a speaker model based on extracted features of an application of the reverberation simulator to the first audio signal; receiving one or more additional audio signals associated with speech of the user, the additional audio signals captured at a second distance from the microphone, the second distance greate

Assignees

Inventors

Classifications

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • Decision making techniques; Pattern matching strategies · CPC title

  • Noise filtering · CPC title

  • the noise being echo, reverberation of the speech · CPC title

  • Score normalisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10096321B2 cover?
Techniques are provided for reverberation compensation for far-field speaker recognition. A methodology implementing the techniques according to an embodiment includes receiving an authentication audio signal associated with speech of a user and extracting features from the authentication audio signal. The method also includes scoring results of application of one or more speaker models to the …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).