Speaker recognition

US9626971B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9626971-B2
Application numberUS-201314119156-A
CountryUS
Kind codeB2
Filing dateSep 20, 2013
Priority dateSep 28, 2012
Publication dateApr 18, 2017
Grant dateApr 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Method for text-dependent Speaker Recognition using a speaker adapted Universal Background Model, wherein the speaker adapted Universal Background Model is a speaker adapted Hidden Markov Model comprising channel correction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for text-dependent Speaker Recognition using a speaker model obtained by adaptation of a Universal Background Model, wherein the speaker model is a speaker adapted Hidden Markov Model, wherein the speaker model uses Bayesian inference to link observed parameters and hidden parameters, wherein the observed parameters are the feature vectors x nmt of utterance m of speaker n and time index t, and wherein the hidden parameters are at least one of a group of: the speaker factor y n for each speaker n, the channel factors u nm of the utterance m of speaker n, the active state s nmt generating the feature vector x nmt , and the active component z nmt generating the feature vector x nmt . 2. The method for text-dependent Speaker Recognition according to claim 1 , wherein the Universal Background Model is unsupervised adapted based on enrolment utterances of the speaker. 3. The method for text-dependent Speaker Recognition according to claim 1 , wherein only mean vectors and transition probabilities are adapted in the speaker model or wherein all parameters are adapted in the speaker model. 4. The method for text-dependent Speaker Recognition according to claim 1 , wherein the Universal Background Model of the text-dependent system is trained in an unsupervised training before it is adapted. 5. The method for text-dependent Speaker Recognition according to claim 1 , wherein utterances of a plurality of speakers, which may speak more than 5 different languages are used for an unsupervised training of the Universal Background Model of the text dependent system. 6. The method for text-dependent Speaker Recognition according to claim 1 , wherein the topology of the Universal Background Model of the text-dependent system is selected to comprise a transition possibility from each possible state to itself and each possible other state. 7. The method for text-dependent Speaker Recognition according to claim 1 , wherein the number of states is set to a number estimated by an analysis of the spectral properties of a signal. 8. The method for text-dependent Speaker Recognition according to claim 1 , further comprising adapting one or more parameters to a lexical content. 9. The method for text-dependent Speaker Recognition according to claim 1 , wherein the eigenvoices matrix and eigenchannel matrix are trained from the generic Universal Background Model in a development session. 10. The method for text-dependent Speaker Recognition according to claim 1 , further comprising the step of verifying in an unsupervised way whether a test signal was spoken by a target person. 11. The method for text-dependent Speaker Recognition according to claim 1 , wherein the speaker adapted model is used only to determine the most likely path, but not to compute the statistics, which are useable to extract the log likelihood ratios, wherein the channel may be compensated. 12. The method for text-dependent Speaker Recognition according to claim 1 , wherein verifying whether the test signal was spoken by the targeted person comprises calculating the difference between the two terms of the log likelihood of the testing audio and the speaker model and the log product of the transition probabilities of the most likely path obtained with the speaker model and the log likelihood of the testing audio and the generic Universal Background Model and the log product of the transition probabilities of the most likely path obtained with the generic Universal Background Model. 13. The method for text-dependent Speaker Recognition according to claim 1 , wherein the method further comprises identifying a target person by identifying the speaker adapted model with the highest likelihood score. 14. The method for text-dependent Speaker Recognition according to claim 1 , wherein the Universal Background Model is a Hidden Markov Model. 15. The method for text-dependent Speaker Recognition according to claim 1 , wherein the mean vectors and the transition probabilities of the Universal Background Model are adapted for the speaker model using a Maximum A Posteriori adaptation. 16. The method for text-dependent Speaker Recognition according to claim 1 , wherein the channel factors are compensated in the speaker adapted model. 17. The method for text-dependent Speaker Recognition according to claim 1 , wherein the following variables are used in the complete model: a sequence of speaker factors Y a sequence of channel factors U a sequence of the feature vectors X a sequence of Hidden Markov Model states S a sequence of Gaussian components Z. 18. The method for text-dependent Speaker Recognition according to claim 1 , wherein the dependencies of the variables are described by a Bayesian network. 19. The method for text-dependent Speaker Recognition according to claim 1 , wherein an iterative Expectation Maximization algorithm is applied for the training of the Universal Background Model given the development data. 20. The method for text-dependent Speaker Recognition according to claim 19 , wherein in the iterative algorithm in some of the iterations an additional step is introduced for maintaining boundary conditions or a step is replaced by a step for maintaining boundary conditions. 21. The method for text-dependent Speaker Recognition according to claim 1 , wherein a speaker dependent Hidden Markov Model is created by adapting the mean vectors and the eigenvoice matrix of the Universal Background Model according to the enrollment data. 22. The method for text-dependent Speaker Recognition according to claim 1 , wherein for the training of the Universal Background Model the model is initialized with values found by training a full covariance Universal Background Model. 23. The method for text-dependent Speaker Recognition according to claim 1 , wherein the method is used for speaker verification. 24. A method for text-dependent Speaker Recognition using a text-dependent and a text-independent system, wherein a model for the text-dependent system is adapted in an unsupervised way, and wherein, in addition, a model for the text-independent system for the speaker and the phrase is built, wherein the model uses Bayesian inference to link observed parameters and hidden parameters, wherein the observed parameters are the feature vectors x nmt of utterance m of speaker n and time index t, and wherein the hidden parameters are at least one of a group of: the speaker factor y n for each speaker n, the channel factors u nm of the utterance m of speaker n, the active state s nmt generating the feature vector x nmt , and the active component z nmt generating the feature vector x nmt . 25. The method for text-dependent Speaker Recognition according to claim 24 , wherein text-dependent speaker recognition according to claim 1 is used. 26. The method for text-dependent Speaker Recognition according to claim 24 , further comprising the step of verifying in an unsupervised way whether a test signal was spoken by the target person. 27. The method for text-dependent Speaker Recognition according to claim 24 , wherein the method further comprises a step of identifying a target person by identifying the speaker adapted model with the highest likelihood score. 28. The method for text-dependent Speaker Recognition according to claim 24 , wherein the scalar weights f

Assignees

Inventors

Classifications

  • the user being prompted to utter a password or a predefined phrase · CPC title

  • Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems · CPC title

  • Training, enrolment or model building · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9626971B2 cover?
Method for text-dependent Speaker Recognition using a speaker adapted Universal Background Model, wherein the speaker adapted Universal Background Model is a speaker adapted Hidden Markov Model comprising channel correction.
Who is the assignee on this patent?
Agnitio S L, Cirrus Logic Int Semiconductor Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).