Channel-Compensated Low-Level Features For Speaker Recognition
US-2018082692-A1 · Mar 22, 2018 · US
US10096321B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10096321-B2 |
| Application number | US-201615242882-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 22, 2016 |
| Priority date | Aug 22, 2016 |
| Publication date | Oct 9, 2018 |
| Grant date | Oct 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for reverberation compensation for far-field speaker recognition. A methodology implementing the techniques according to an embodiment includes receiving an authentication audio signal associated with speech of a user and extracting features from the authentication audio signal. The method also includes scoring results of application of one or more speaker models to the extracted features. Each of the speaker models is trained based on a training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with that speaker model. The method further includes selecting one of the speaker models, based on the score, and mapping the selected speaker model to a known speaker identification or label that is associated with the user.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented method for speaker recognition, the method comprising: receiving, by a processor, an authentication audio signal associated with speech of a user; extracting, by the processor, features from the authentication audio signal; scoring, by the processor, results of application of one or more speaker models to the extracted features, wherein each of the speaker models is trained based on a training audio signal, the training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with the speaker model; selecting, by the processor, one of the speaker models, the selected speaker model associated with the highest of the scores; and recognizing, by the processor, an identity of the user based on a known speaker identification (ID) associated with the selected speaker model, the recognized identity for use to authenticate the user. 2. The method of claim 1 , wherein the training of the speaker models further comprises: capturing a plurality of the training audio signals from a plurality of users; receiving a speaker ID for each of the users; and processing each of the plurality of training audio signals by the reverberation simulator to generate a plurality of reverberation processed training audio signals for each of the training audio signals, wherein each of the reverberation processed training audio signals is associated with a unique far-field environmental effect. 3. The method of claim 2 , wherein the training of the speaker models further comprises: generating feature sets of extracted features from each of the training audio signals and from each of the reverberation processed training audio signals; generating speaker models based on each feature set; and assigning the speaker ID as the known speaker ID associated with the generated speaker model. 4. The method of claim 1 , wherein the authentication audio signal is captured in a far-field of the microphone and the training audio signal is captured in a near-field of the microphone. 5. The method of claim 4 , wherein the far-field is a distance greater than three feet from the microphone and the near-field is a distance closer than three feet from the microphone. 6. A processor-implemented method for configuring a reverberation simulator for speaker recognition, the method comprising: receiving, by a processor, a first audio signal associated with speech of a user, the first audio signal captured at a first distance from a microphone; selecting, by the processor, a trial set of parameters for a reverberation simulator; generating, by the processor, a speaker model based on extracted features of an application of the reverberation simulator to the first audio signal; receiving, by the processor, one or more additional audio signals associated with speech of the user, the additional audio signals captured at a second distance from the microphone, the second distance greater than the first distance; scoring, by the processor, results of application of the speaker model to extracted features of each of the additional audio signals; and associating, by the processor, a summation of the scores with the trial set of parameters the summation of the scores to indicate a relative effectiveness of the trial set of parameters for modeling a far-field environment of the microphone at the second distance. 7. The method of claim 6 , further comprising selecting the trial set of parameters as an operational set of parameters based on the summation of scores associated with the trial set of parameters. 8. The method of claim 6 , further comprising generating an updated trial set of parameters for the reverberation simulator using an optimization algorithm based on the summation of scores. 9. The method of claim 8 , wherein the optimization algorithm is one of a genetic algorithm or a gradient descent algorithm. 10. The method of claim 6 , wherein the reverberation simulator is a Schroeder reverberator and the reverberation parameters comprise one or more of an effect mix parameter, a room size parameter, a damping parameter, and a stereo width parameter. 11. The method of claim 6 , wherein the second distance is in the far-field of the microphone and the first distance is in the near-field of the microphone. 12. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for speaker recognition, the operations comprising: receiving an authentication audio signal associated with speech of a user; extracting features from the authentication audio signal; scoring results of application of one or more speaker models to the extracted features, wherein each of the speaker models is trained based on a training audio signal, the training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with the speaker model; selecting one of the speaker models, the selected speaker model associated with the highest of the scores; and recognizing an identity of the user based on a known speaker identification (ID) associated with the selected speaker model, the recognized identity for use to authenticate the user. 13. The computer readable storage medium of claim 12 , wherein the training of the speaker models further comprises the operations: capturing a plurality of the training audio signals from a plurality of users; receiving a speaker ID for each of the users; and processing each of the plurality of training audio signals by the reverberation simulator to generate a plurality of reverberation processed training audio signals for each of the training audio signals, wherein each of the reverberation processed training audio signals is associated with a unique far-field environmental effect. 14. The computer readable storage medium of claim 13 , wherein the training of the speaker models further comprises the operations: generating feature sets of extracted features from each of the training audio signals and from each of the reverberation processed training audio signals; generating speaker models based on each feature set; and assigning the speaker ID as the known speaker ID associated with the generated speaker model. 15. The computer readable storage medium of claim 12 , wherein the authentication audio signal is captured in a far-field of the microphone and the training audio signal is captured in a near-field of the microphone. 16. The computer readable storage medium of claim 15 , wherein the far-field is a distance greater than three feet from the microphone and the near-field is a distance closer than three feet from the microphone. 17. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for configuring a reverberation simulator for speaker recognition, the operations comprising: receiving a first audio signal associated with speech of a user, the first audio signal captured at a first distance from a microphone; selecting a trial set of parameters for a reverberation simulator; generating a speaker model based on extracted features of an application of the reverberation simulator to the first audio signal; receiving one or more additional audio signals associated with speech of the user, the additional audio signals captured at a second distance from the microphone, the second distance greate
Training, enrolment or model building · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Noise filtering · CPC title
the noise being echo, reverberation of the speech · CPC title
Score normalisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.