Fast, language-independent method for user authentication by voice

US2016343377A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016343377-A1
Application numberUS-201514977494-A
CountryUS
Kind codeA1
Filing dateDec 21, 2015
Priority dateMar 16, 2000
Publication dateNov 24, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.

First claim

Opening claim text (preview).

1 - 32 . (canceled) 33 . A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances: generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker identification process. 34 . The method of claim 33 , wherein decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises: applying a singular value decomposition to the respective phoneme-independent representation. 35 . The method of claim 33 further comprising: generating the respective content-independent recognition unit from a singular value matrix of a singular value decomposition of the respective phoneme-independent representation for each of the plurality of different spoken utterances. 36 . The method of claim 33 , wherein the speaker identification process comprises: decomposing at least one spectral signature of an input speech signal into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one of the content-independent recognition distribution values; and determining that the input speech signal is associated with the user if the at least one content-independent characteristic unit is within a threshold limit of the at least one of the content-independent recognition distribution values. 37 . The method of claim 36 , wherein decomposing the at least one spectral signature of the input speech signal into the at least one content-independent characteristic unit further comprises: applying a singular value decomposition to the at least one spectral signature of the input speech signal. 38 . The method of claim 36 , wherein: for each of the plurality of different spoken utterances, the respective phoneme-independent representation is decomposed to further obtain a respective content reference sequence; the at least one spectral signature of the input speech signal is further decomposed into at least one content input sequence; and determining that the input speech signal is associated with the user further comprises determining that the input speech signal is associated with the user if the at least one content input sequence is similar to at least one of the respective content reference sequences. 39 . The method of claim 38 , further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one of the respective content reference sequences. 40 . A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a spoken utterance; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by: generating a second phoneme-independent representation based on speech from the registered user; and decomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; and determining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. 41 . The method of claim 40 , further comprising: generating the at least one content-independent characteristic unit from a singular value matrix of a singular value decomposition of the first phoneme-independent representation. 42 . The method of claim 40 , further comprising: computing the at least one content-independent recognition distribution value from the at least one content-independent recognition unit. 43 . The method of claim 42 , further comprising: generating the at least one content-independent recognition unit from a singular value matrix of a singular value decomposition of the second phoneme-independent representation. 44 . The method of claim 42 , wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 45 . The method of claim 40 , wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 46 . The method of claim 40 , wherein the first phoneme-independent representation is further decomposed into at least one content input sequence, and wherein determining that the spoken utterance is spoken by the registered user further comprises determining that the spoken utterance is spoken by the registered user if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker. 47 . The method of claim 46 , further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence. 48 . A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances: generate a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decompose the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculate a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and provide the content-independent recognition distribution value for use in a speaker identification process. 49 . A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a spoken utterance; generate a first phoneme-independent representation based on the spoken utterance; decompose the first phoneme-independent representation into at least one content-independent characteristic unit; compare the at least one content-independent characteristic unit to at least one-content-independ

Assignees

Inventors

Classifications

  • Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • G10L17/22Primary

    Interactive procedures; Man-machine interfaces · CPC title

  • Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title

  • to the speaker · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016343377A1 cover?
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).