Who is the assignee on this patent?

St Microelectronics Asia Pacific Pte Ltd

What technology area does this patent fall under?

Primary CPC classification G10L17/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods, systems, and circuits for text independent speaker recognition with automatic learning features

US9530417B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9530417-B2
Application number	US-201313854134-A
Country	US
Kind code	B2
Filing date	Apr 1, 2013
Priority date	Jan 4, 2013
Publication date	Dec 27, 2016
Grant date	Dec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems of text independent speaker recognition provide a complexity comparable to text dependent speaker recognition system. These methods and systems exploit the fact that speech is a quasi-stationary signal and simplify the recognition process based on this theory. The speaker modeling allows a speaker profile to be updated progressively with new speech samples that are acquired during usage over time by the speaker.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of text independent speaker recognition, comprising: extracting feature vectors from initial frames of speech generated responsive to text-independent speech from a user; clustering the extracted feature vectors to generate a plurality of clusters; modelling each of the plurality of clusters as a Gaussian Mixture Model that collectively form a speaker profile for the user; setting a different-state transition probability and a same-state transition probability for each of the Gaussian Mixture Models, the same-state transition probability having a much greater value than the different-state transition probability; capturing additional frames of speech from additional text-independent speech from a speaker; extracting feature vectors from the additional frames; and determining likelihoods that each of the additional frames belongs to each of the Gaussian Mixture Models based on the same-state and different-state transition probabilities, wherein the determining likelihoods includes, for each additional frame of speech, determining a log likelihood (loglk) variable for each cluster to determine the probability that the additional frame of speech belongs to that particular cluster; and determining whether the speaker is an authorized user from the determined likelihoods. 2. The method of claim 1 wherein setting a different-state transition probability and a same-state transition probability for each of the Gaussian Mixture Models comprises setting the different-state transition probability between Gaussian Mixture Models to a value of 0.05 and setting the same-state transition probability for each Gaussian Mixture Model to a value of 0.95. 3. The method of claim 1 wherein determining whether the speaker is an authorized user from the determined likelihoods comprises is taken at any point in time by selecting the speaker having the highest value of the loglk variable from among a plurality of loglk variables for a plurality of speakers. 4. The method of claim 3 wherein a confidence value can be calculated by evaluating a difference between loglk variables for two speakers having the highest loglk variables. 5. The method of claim 4 wherein the confidence value can be the set as a threshold to accept or reject a speaker, and the threshold value is based on a desired reliability of the system. 6. The method of claim 4 wherein the value of the loglk variable for each speaker is reset to zero upon encountering a period of silence in the frames of speech. 7. The method of claim 4 wherein the value of the loglk variable is reset upon encountering a period of silence in the frames of speech. 8. The method of claim 7 wherein the Gaussian Mixture Model of each of the clusters is continuously updated using frames of speech that have high confidence values to belong to the selected speaker. 9. The method of claim 1 , wherein extracting feature vectors from frames of speech generated responsive to text-independent speech from a user comprises generating a Mel Frequency Cepstral Coefficient (MFCC) for each frame. 10. The method of claim 1 , wherein clustering the extracted feature vectors to generate a plurality of clusters that collectively represent a speaker profile comprises clustering the feature vectors using a K-means clustering algorithm.

Assignees

St Microelectronics Asia Pacific Pte Ltd

Inventors

Classifications

G10L15/144
Training of HMMs · CPC title
G10L15/063
Training · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title
G10L17/02
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
G10L17/04Primary
Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

View patent family 51061666

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9530417B2 cover?: Methods and systems of text independent speaker recognition provide a complexity comparable to text dependent speaker recognition system. These methods and systems exploit the fact that speech is a quasi-stationary signal and simplify the recognition process based on this theory. The speaker modeling allows a speaker profile to be updated progressively with new speech samples that are acquired …
Who is the assignee on this patent?: St Microelectronics Asia Pacific Pte Ltd
What technology area does this patent fall under?: Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).