Deepfake detection
US-2024355334-A1 · Oct 24, 2024 · US
US2016343377A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016343377-A1 |
| Application number | US-201514977494-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 21, 2015 |
| Priority date | Mar 16, 2000 |
| Publication date | Nov 24, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
Opening claim text (preview).
1 - 32 . (canceled) 33 . A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances: generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker identification process. 34 . The method of claim 33 , wherein decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises: applying a singular value decomposition to the respective phoneme-independent representation. 35 . The method of claim 33 further comprising: generating the respective content-independent recognition unit from a singular value matrix of a singular value decomposition of the respective phoneme-independent representation for each of the plurality of different spoken utterances. 36 . The method of claim 33 , wherein the speaker identification process comprises: decomposing at least one spectral signature of an input speech signal into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one of the content-independent recognition distribution values; and determining that the input speech signal is associated with the user if the at least one content-independent characteristic unit is within a threshold limit of the at least one of the content-independent recognition distribution values. 37 . The method of claim 36 , wherein decomposing the at least one spectral signature of the input speech signal into the at least one content-independent characteristic unit further comprises: applying a singular value decomposition to the at least one spectral signature of the input speech signal. 38 . The method of claim 36 , wherein: for each of the plurality of different spoken utterances, the respective phoneme-independent representation is decomposed to further obtain a respective content reference sequence; the at least one spectral signature of the input speech signal is further decomposed into at least one content input sequence; and determining that the input speech signal is associated with the user further comprises determining that the input speech signal is associated with the user if the at least one content input sequence is similar to at least one of the respective content reference sequences. 39 . The method of claim 38 , further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one of the respective content reference sequences. 40 . A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a spoken utterance; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by: generating a second phoneme-independent representation based on speech from the registered user; and decomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; and determining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. 41 . The method of claim 40 , further comprising: generating the at least one content-independent characteristic unit from a singular value matrix of a singular value decomposition of the first phoneme-independent representation. 42 . The method of claim 40 , further comprising: computing the at least one content-independent recognition distribution value from the at least one content-independent recognition unit. 43 . The method of claim 42 , further comprising: generating the at least one content-independent recognition unit from a singular value matrix of a singular value decomposition of the second phoneme-independent representation. 44 . The method of claim 42 , wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 45 . The method of claim 40 , wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 46 . The method of claim 40 , wherein the first phoneme-independent representation is further decomposed into at least one content input sequence, and wherein determining that the spoken utterance is spoken by the registered user further comprises determining that the spoken utterance is spoken by the registered user if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker. 47 . The method of claim 46 , further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence. 48 . A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances: generate a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decompose the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculate a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and provide the content-independent recognition distribution value for use in a speaker identification process. 49 . A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a spoken utterance; generate a first phoneme-independent representation based on the spoken utterance; decompose the first phoneme-independent representation into at least one content-independent characteristic unit; compare the at least one content-independent characteristic unit to at least one-content-independ
Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title
Training, enrolment or model building · CPC title
Interactive procedures; Man-machine interfaces · CPC title
Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title
to the speaker · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.