Deepfake detection
US-2024355334-A1 · Oct 24, 2024 · US
US2018366124A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018366124-A1 |
| Application number | US-201715626828-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 19, 2017 |
| Priority date | Jun 19, 2017 |
| Publication date | Dec 20, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for training of a text independent (TI) speaker recognition (SR) model. A methodology implementing the techniques according to an embodiment includes measuring context data associated with collected TI speech utterances from a user and identifying the user based on received identity measurements. The method further includes performing a speech quality analysis and a speaker state analysis based on the utterances, and evaluating a training merit value of the utterances, based on the speech quality analysis and the speaker state analysis. If the training merit value exceeds a threshold value, the utterances are stored as training data in a training database. The database is indexed by the user identity and the context data. The method further includes determining whether the stored training data has achieved a sufficiency level for enrollment of a TI SR model, and training the TI SR model for the identified user and context.
Opening claim text (preview).
What is claimed is: 1 . A processor-implemented method for training of a text independent (TI) speaker recognition model, the method comprising: measuring, by a processor-based system, context data associated with collected TI speech utterances from a user in a context, the collected TI speech collected during a first time interval; identifying, by the processor-based system, an identity of the user based on received identity measurements; performing, by the processor-based system, a speech quality analysis of the TI speech utterances; performing, by the processor-based system, a state analysis of the user based on the TI speech utterances; evaluating, by the processor-based system, a training merit value associated with the TI speech utterances, based on the speech quality analysis and the state analysis; and storing, by the processor-based system, the TI speech utterances as training data in a training database, if the training merit value exceeds a threshold value, the stored utterances indexed by the user identity and the context data. 2 . The method of claim 1 , further comprising: determining a sufficiency of the stored training data for enrollment of a TI speaker recognition model; and training the TI speaker recognition model, associated with the user and the context, based on the stored training data. 3 . The method of claim 2 , wherein the enrollment further comprises: adding the trained TI speaker recognition model to a database of TI speaker recognition models, based on the sufficiency of the stored training data, the database indexed by the user identity and the context data; and enabling a TI speaker recognizer for the user in the context based on the added TI speaker recognition model. 4 . The method of claim 2 , further comprising: collecting additional TI speech utterances from the user in the context, during a second time interval; evaluating an adaptation merit value associated with the additional TI speech utterances, the adaptation merit value based on at least one of the elapsed time between the first time interval and the second time interval, and an estimate of improvement of the TI speaker recognition model due to adaptation based on the additional TI speech utterances; and adapting the TI speaker recognition model based on the additional TI speech utterances, if the adaptation merit value exceeds a threshold. 5 . The method of claim 2 , wherein the determination of sufficiency further comprises: measuring variance of phonemes of the collected TI speech utterances; and estimating future performance of a TI speaker recognition model trained on the stored training data. 6 . The method of claim 1 , wherein the identity measurements comprise at least one of a result of text dependent (TD) speaker recognition, facial recognition, lip movement detection, skeletal recognition, fingerprint recognition, and biometric factor measurement. 7 . The method of claim 1 , wherein the speech quality analysis comprises measuring at least one of a number of frames of the TI speech utterances, a speech to noise ratio (SNR) of the TI speech utterances, noise characteristics of the TI speech utterances, and reverberation characteristics of the TI speech utterances; and the state analysis comprises predicting health and emotional state of the user. 8 . The method of claim 1 , wherein the context data includes at least one of a location of the collected TI speech utterances, a date of the collection, properties of a microphone used for the collection, SNR, noise characteristics, reverberation characteristics, and health and emotional state of the user. 9 . The method of claim 1 , wherein the speech utterances are represented as feature vectors. 10 . A system for training of a text independent (TI) speaker recognition model, the system comprising: a context determination circuit to measure context data associated with collected TI speech utterances from a user in a context, the collected TI speech collected during a first time interval; an identity evidence collection circuit to identify the user based on received identity measurements; a speech quality analysis circuit to perform a speech quality analysis of the TI speech utterances; a speaker state analysis circuit to perform a state analysis of the user based on the TI speech utterances; a training merit evaluation circuit to estimate a training merit value associated with the TI speech utterances, based on the speech quality analysis and the state analysis; an utterance cataloging circuit to store the TI speech utterances as training data in a training database, if the training merit value exceeds a threshold value, the stored utterances indexed by the user identity and the context data; a training data sufficiency determination circuit to evaluate a sufficiency of the stored training data for enrollment of a TI speaker recognition model; a TI speaker recognition training circuit to train the TI speaker recognition model, associated with the user and the context, based on the stored training data. 11 . The system of claim 10 , wherein the TI speaker recognition training circuit is further to add the trained TI speaker recognition model to a database of TI speaker recognition models, based on the sufficiency of the stored training data, the database indexed by the user identity and the context data; and to enable a TI speaker recognition circuit to recognize the user in the context based on the added TI speaker recognition model. 12 . The system of claim 10 , further comprising a TI speaker recognition adaptation circuit to: collect additional TI speech utterances from the user in the context, during a second time interval; evaluate an adaptation merit value associated with the additional TI speech utterances, the adaptation merit value based on at least one of the elapsed time between the first time interval and the second time interval, and an estimate of improvement of the TI speaker recognition model due to adaptation based on the additional TI speech utterances; and adapt the TI speaker recognition model based on the additional TI speech utterances, if the adaptation merit value exceeds a threshold. 13 . The system of claim 10 , wherein the training data sufficiency determination circuit is further to: measure variance of phonemes of the collected TI speech utterances; and estimate future performance of a TI speaker recognition model trained on the stored training data. 14 . The system of claim 10 , wherein the identity measurements comprise at least one of a result of text dependent (TD) speaker recognition, facial recognition, lip movement detection, skeletal recognition, fingerprint recognition, and biometric factor measurement. 15 . The system of claim 10 , wherein the speech quality analysis circuit is further to measure at least one of a number of frames of the TI speech utterances, a speech to noise ratio (SNR) of the TI speech utterances, noise characteristics of the TI speech utterances, and reverberation characteristics of the TI speech utterances; and the state analysis circuit is further to predict health and emotional state of the user. 16 . The system of claim 10 , wherein the context data includes at least one of a location of the collected TI speech utterances, a date of the collection, properties of a microphone used for the collection, SNR, noise characteristics, reverberation characteristics, and health and emotional state of the user. 17 . The system of claim 10 , wherein the speech utterances are represented as feature vectors.
Phonemes, fenemes or fenones being the recognition units · CPC title
Interactive procedures; Man-machine interfaces · CPC title
Decision making techniques; Pattern matching strategies · CPC title
for measuring the quality of voice signals · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.