Voice profile updating

US11004454B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11004454-B1
Application numberUS-201816182021-A
CountryUS
Kind codeB1
Filing dateNov 6, 2018
Priority dateNov 6, 2018
Publication dateMay 11, 2021
Grant dateMay 11, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: generating first voice profile data associated with a group profile identifier and a first user identifier; identifying a first stored representation of a first user input received after the first voice profile data is generated, the first stored representation being associated with a group profile identifier and the first user identifier; identifying a second stored representation of a second user input received after the first voice profile data is generated, the second stored representation being associated with the group profile identifier and the first user identifier; determining the first stored representation is similar to the second stored representation; generating updated first voice profile data using the first voice profile data, the first stored representation, and the second stored representation; and storing an association between the updated first voice profile data and the first user identifier. 2. The method of claim 1 , further comprising: determining a first number of stored representations used to generate the first voice profile data; determining a second number corresponding to an amount of stored representations, the second number representing the first number, the first stored representation, and the second stored representation; and based at least in part on the second number, generating the updated first voice profile data. 3. The method of claim 1 , further comprising: determining the first stored representation is associated with a first intent indicator; determining the second stored representation is associated with a second intent indicator; determining the first intent indicator is different from the second intent indicator; and based at least in part on determining the first intent indicator is different from the second intent indicator, generating the updated first voice profile data. 4. The method of claim 1 , further comprising: determining a third stored representation of a third user input; determining a fourth stored representation of a fourth user input; determining the third stored representation is similar to the fourth stored representation; determining the third stored representation and the fourth stored representation are dissimilar to the first voice profile data; and based at least in part on determining the third stored representation and the fourth stored representation are dissimilar to the first voice profile data, generating second voice profile data using the third stored representation and the fourth stored representation. 5. The method of claim 1 , wherein the first stored representation corresponds to first audio data, the second stored representation corresponds to second audio data, and wherein the method further comprises: generating a first user recognition feature vector representing the first audio data; generating a second user recognition feature vector representing the second audio data; and determining the first user recognition feature vector is similar to the second user recognition feature vector. 6. The method of claim 1 , wherein the first stored representation is a first user recognition feature vector, the second stored representation is a second user recognition feature vector, and the method further comprises: determining a distance between the first user recognition feature vector and the second user recognition feature vector; and based at least in part on the distance, determining the first user recognition feature vector is similar to the second user recognition feature vector. 7. A method, comprising: generating first voice profile data associated with a first user identifier; identifying a first stored representation of a first user input received after the first voice profile data is generated, the first stored representation being associated with the first user identifier; identifying a second stored representation of a second user input received after the first voice profile data is generated, the second stored representation being associated with the first user identifier; determining the first stored representation is similar to the second stored representation; generating updated first voice profile data using the first voice profile data, the first stored representation, and the second stored representation; and storing an association between the updated first voice profile data and the first user identifier. 8. The method of claim 7 , wherein the first stored representation corresponds to first audio data, the second stored representation corresponds to second audio data, and wherein the method further comprises: generating a first user recognition feature vector representing the first audio data; generating a second user recognition feature vector representing the second audio data; and determining the first user recognition feature vector is similar to the second user recognition feature vector. 9. The method of claim 7 , further comprising: determining a first number of stored representations used to generate the first voice profile data; determining a second number corresponding to an amount of stored representations, the second number representing the first number, the first stored representation, and the second stored representation; and based at least in part on the second number, generating the updated first voice profile data. 10. The method of claim 7 , further comprising: determining the first stored representation is associated with an intent indicator; and based at least in part on the first stored representation being associated with the intent indicator, generating the updated first voice profile data. 11. The method of claim 7 , further comprising: identifying a third stored representation of a third user input; determining the third stored representation is dissimilar to the first voice profile data; and generating second voice profile data using the third stored representation. 12. The method of claim 7 , further comprising: identifying the first stored representation based at least in part on the first stored representation being associated with a group profile identifier. 13. The method of claim 7 , wherein the first stored representation is a first user recognition feature vector, wherein the second stored representation is a second user recognition feature vector, and wherein the method further comprises: determining a distance between the first user recognition feature vector and the second user recognition feature vector; and based at least in part on the distance, determining the first user recognition feature vector is similar to the second user recognition feature vector. 14. The method of claim 7 , further comprising: receiving audio data representing a third user input; storing the audio data; after storing the audio data, receiving an indicator representing the audio data is to be associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and using the audio data, generating updated second voice profile data. 15. The method of claim 7 , further comprising: receiving first audio data representing a third user input; determining the first audio data corresponds to second voice profile data associated with a second user identifier; determining user profile data, associated with the second user identifier, corresponds to a user name; generating second audio data including the user name; causing a first device to output audio corresponding to the second audio data; receiving, from the first device, third a

Assignees

Inventors

Classifications

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • Execution procedure of a spoken command · CPC title

  • Word spotting · CPC title

  • Speaker identification or verification techniques · CPC title

  • Sound input; Sound output (speech processing G10L) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11004454B1 cover?
Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).