Deepfake detection
US-2024355334-A1 · Oct 24, 2024 · US
US9373330B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9373330-B2 |
| Application number | US-201414454169-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2014 |
| Priority date | Aug 7, 2014 |
| Publication date | Jun 21, 2016 |
| Grant date | Jun 21, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for performing speaker recognition comprises: estimating respective uncertainties of acoustic coverage of respective speech utterance(s) by first and second speakers, the acoustic coverage representing respective sounds used by the speakers when speaking; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the speakers; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the speakers; and computing a score between the speech utterance(s) by the speakers in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the speakers are the same speaker.
Opening claim text (preview).
What is claimed is: 1. A method of speaker recognition in a speaker recognition system, the method comprising: by a processor, estimating respective uncertainties of acoustic coverage of at least one speech utterance by a first speaker and at least one speech utterance by a second speaker, the acoustic coverage representing respective sounds used by the first speaker and by the second speaker when speaking; by the processor, representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the first speaker and for the second speaker to improve speaker recognition performance of the processor; by the processor, representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the first speaker and for the second speaker to further improve the speaker recognition performance of the processor; and by the processor, computing a score between the at least one speech utterance by the first speaker and the at least one speech utterance by the second speaker in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the first speaker and the second speaker are the same speaker. 2. A method of claim 1 wherein: representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation includes: accumulating an inverse of independent uncertainties of acoustic coverage for multiple speech utterances by the first speaker and for multiple speech utterances by the second speaker; transforming accumulated inverses of the independent uncertainties of acoustic coverage; and discarding dependencies between the uncertainties of different sounds represented in the transformed accumulated inverses to produce respective diagonalized, transformed accumulated inverses; and computing the score includes using the respective diagonalized, transformed accumulated inverses. 3. The method of claim 1 further comprising receiving, by a computer system, a set of signals corresponding to the speech utterances; and wherein representing the respective uncertainties of acoustic coverage in memory and computationally efficient manners include: computing, for each speech utterance of the set of speech utterances, a corresponding identity vector (i-vector), a diagonalized approximation of a covariance matrix of the corresponding i-vector, and a diagonalized approximation of an equivalent precision matrix associated with the corresponding i-vector; and further wherein: computing the score is based on the i-vectors, the diagonalized approximations of covariance matrices, and the diagonalized approximations of equivalent precision matrices computed, the score being indicative of a likelihood of a correspondence between the set of utterances received and a speaker. 4. The method of claim 3 , wherein computing the score includes computing the score for each speaker of a number of speakers known to the computer system, and the method further comprises: determining an identifier corresponding to the speaker having the highest score. 5. The method of claim 3 further comprising: computing a set of projected first order statistics corresponding to the set of speech utterances, based on the i-vectors and the diagonalized approximations of the equivalent precision matrices computed; computing a diagonalized approximation of a cumulative equivalent precision matrix for the set of speech utterances based on the diagonalized approximations of precision matrices computed for each i-vector; and diagonalizing a transformation of the diagonalized approximation of the cumulative equivalent precision matrix computed; and wherein, computing the score includes computing the score based on the set of projected first order statistics, the diagonalized approximation of the cumulative equivalent precision matrix, and the diagonalized transformation of the diagonalized approximation of the cumulative equivalent precision matrix. 6. The method of claim 5 further comprising: computing, for each i-vector of a set of i-vectors corresponding to a speaker known to the computer system by way of parameters that had been previously stored in an associated database, the diagonalized approximation of a covariance matrix of the i-vector corresponding to the speaker known to the computer system and a diagonalized approximation of an equivalent precision matrix associated with the i-vector corresponding to the speaker known to the computer system; and storing the i-vector and the diagonalized approximation of a covariance matrix of the i-vector in a database. 7. The method of claim 6 further comprising: computing a set of projected first order statistics, for each speaker known to the computer system, based on the set of i-vectors associated with the speaker known to the computer system and the diagonalized approximations of the equivalent precision matrices computed; computing a diagonalized approximation of a cumulative equivalent precision matrix for each speaker known to the computer system based on the diagonalized approximations of precision matrices computed for each i-vector in the set of i-vectors corresponding to the speaker known to the computer system; and diagonalizing a transformation of the diagonalized approximation of the cumulative equivalent precision matrix computed for each speaker known to computer system. 8. The method of claim 7 , wherein computing the score includes computing the score based on the set of projected first order statistics, the diagonalized approximation of the cumulative equivalent precision matrix, and the diagonalized transformation of the diagonalized approximation of the cumulative equivalent precision matrix associated with the speaker known to the computer system. 9. The method of claim 3 , wherein determining if the set of speech utterances corresponds to one or any of the number of speakers known to the computer system includes comparing the score computed to a threshold. 10. An apparatus for speaker recognition, the apparatus comprising: an estimation module configured to estimate respective uncertainties of acoustic coverage of at least one speech utterance by a first speaker and at least one speech utterance by a second speaker, the acoustic coverage representing respective sounds used by the first speaker and by the second speaker when speaking; a processor; and a memory, with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions stored thereon, are configured to cause the apparatus to: represent the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the first speaker and for the second speaker to improve speaker recognition performance of the processor; and represent the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the first speaker and for the second speaker to further improve the speaker recognition performance of the processor; and a scoring module configured to compute a score between the at least one speech utterance by the first speaker and the at least one spe
Decision making techniques; Pattern matching strategies · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.