What technology area does this patent fall under?

Primary CPC classification G10L17/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Dec 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Context-aware enrollment for text independent speaker recognition

US2018366124A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018366124-A1
Application number	US-201715626828-A
Country	US
Kind code	A1
Filing date	Jun 19, 2017
Priority date	Jun 19, 2017
Publication date	Dec 20, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for training of a text independent (TI) speaker recognition (SR) model. A methodology implementing the techniques according to an embodiment includes measuring context data associated with collected TI speech utterances from a user and identifying the user based on received identity measurements. The method further includes performing a speech quality analysis and a speaker state analysis based on the utterances, and evaluating a training merit value of the utterances, based on the speech quality analysis and the speaker state analysis. If the training merit value exceeds a threshold value, the utterances are stored as training data in a training database. The database is indexed by the user identity and the context data. The method further includes determining whether the stored training data has achieved a sufficiency level for enrollment of a TI SR model, and training the TI SR model for the identified user and context.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor-implemented method for training of a text independent (TI) speaker recognition model, the method comprising: measuring, by a processor-based system, context data associated with collected TI speech utterances from a user in a context, the collected TI speech collected during a first time interval; identifying, by the processor-based system, an identity of the user based on received identity measurements; performing, by the processor-based system, a speech quality analysis of the TI speech utterances; performing, by the processor-based system, a state analysis of the user based on the TI speech utterances; evaluating, by the processor-based system, a training merit value associated with the TI speech utterances, based on the speech quality analysis and the state analysis; and storing, by the processor-based system, the TI speech utterances as training data in a training database, if the training merit value exceeds a threshold value, the stored utterances indexed by the user identity and the context data. 2 . The method of claim 1 , further comprising: determining a sufficiency of the stored training data for enrollment of a TI speaker recognition model; and training the TI speaker recognition model, associated with the user and the context, based on the stored training data. 3 . The method of claim 2 , wherein the enrollment further comprises: adding the trained TI speaker recognition model to a database of TI speaker recognition models, based on the sufficiency of the stored training data, the database indexed by the user identity and the context data; and enabling a TI speaker recognizer for the user in the context based on the added TI speaker recognition model. 4 . The method of claim 2 , further comprising: collecting additional TI speech utterances from the user in the context, during a second time interval; evaluating an adaptation merit value associated with the additional TI speech utterances, the adaptation merit value based on at least one of the elapsed time between the first time interval and the second time interval, and an estimate of improvement of the TI speaker recognition model due to adaptation based on the additional TI speech utterances; and adapting the TI speaker recognition model based on the additional TI speech utterances, if the adaptation merit value exceeds a threshold. 5 . The method of claim 2 , wherein the determination of sufficiency further comprises: measuring variance of phonemes of the collected TI speech utterances; and estimating future performance of a TI speaker recognition model trained on the stored training data. 6 . The method of claim 1 , wherein the identity measurements comprise at least one of a result of text dependent (TD) speaker recognition, facial recognition, lip movement detection, skeletal recognition, fingerprint recognition, and biometric factor measurement. 7 . The method of claim 1 , wherein the speech quality analysis comprises measuring at least one of a number of frames of the TI speech utterances, a speech to noise ratio (SNR) of the TI speech utterances, noise characteristics of the TI speech utterances, and reverberation characteristics of the TI speech utterances; and the state analysis comprises predicting health and emotional state of the user. 8 . The method of claim 1 , wherein the context data includes at least one of a location of the collected TI speech utterances, a date of the collection, properties of a microphone used for the collection, SNR, noise characteristics, reverberation characteristics, and health and emotional state of the user. 9 . The method of claim 1 , wherein the speech utterances are represented as feature vectors. 10 . A system for training of a text independent (TI) speaker recognition model, the system comprising: a context determination circuit to measure context data associated with collected TI speech utterances from a user in a context, the collected TI speech collected during a first time interval; an identity evidence collection circuit to identify the user based on received identity measurements; a speech quality analysis circuit to perform a speech quality analysis of the TI speech utterances; a speaker state analysis circuit to perform a state analysis of the user based on the TI speech utterances; a training merit evaluation circuit to estimate a training merit value associated with the TI speech utterances, based on the speech quality analysis and the state analysis; an utterance cataloging circuit to store the TI speech utterances as training data in a training database, if the training merit value exceeds a threshold value, the stored utterances indexed by the user identity and the context data; a training data sufficiency determination circuit to evaluate a sufficiency of the stored training data for enrollment of a TI speaker recognition model; a TI speaker recognition training circuit to train the TI speaker recognition model, associated with the user and the context, based on the stored training data. 11 . The system of claim 10 , wherein the TI speaker recognition training circuit is further to add the trained TI speaker recognition model to a database of TI speaker recognition models, based on the sufficiency of the stored training data, the database indexed by the user identity and the context data; and to enable a TI speaker recognition circuit to recognize the user in the context based on the added TI speaker recognition model. 12 . The system of claim 10 , further comprising a TI speaker recognition adaptation circuit to: collect additional TI speech utterances from the user in the context, during a second time interval; evaluate an adaptation merit value associated with the additional TI speech utterances, the adaptation merit value based on at least one of the elapsed time between the first time interval and the second time interval, and an estimate of improvement of the TI speaker recognition model due to adaptation based on the additional TI speech utterances; and adapt the TI speaker recognition model based on the additional TI speech utterances, if the adaptation merit value exceeds a threshold. 13 . The system of claim 10 , wherein the training data sufficiency determination circuit is further to: measure variance of phonemes of the collected TI speech utterances; and estimate future performance of a TI speaker recognition model trained on the stored training data. 14 . The system of claim 10 , wherein the identity measurements comprise at least one of a result of text dependent (TD) speaker recognition, facial recognition, lip movement detection, skeletal recognition, fingerprint recognition, and biometric factor measurement. 15 . The system of claim 10 , wherein the speech quality analysis circuit is further to measure at least one of a number of frames of the TI speech utterances, a speech to noise ratio (SNR) of the TI speech utterances, noise characteristics of the TI speech utterances, and reverberation characteristics of the TI speech utterances; and the state analysis circuit is further to predict health and emotional state of the user. 16 . The system of claim 10 , wherein the context data includes at least one of a location of the collected TI speech utterances, a date of the collection, properties of a microphone used for the collection, SNR, noise characteristics, reverberation characteristics, and health and emotional state of the user. 17 . The system of claim 10 , wherein the speech utterances are represented as feature vectors.

Assignees

Intel Corp

Inventors

Classifications

G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L17/22
Interactive procedures; Man-machine interfaces · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title
G10L25/60
for measuring the quality of voice signals · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

View patent family 64657553

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018366124A1 cover?: Techniques are provided for training of a text independent (TI) speaker recognition (SR) model. A methodology implementing the techniques according to an embodiment includes measuring context data associated with collected TI speech utterances from a user and identifying the user based on received identity measurements. The method further includes performing a speech quality analysis and a spea…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Dec 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).