Device and method to personalize speech recognition model
US-2019348023-A1 · Nov 14, 2019 · US
US11004454B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11004454-B1 |
| Application number | US-201816182021-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 6, 2018 |
| Priority date | Nov 6, 2018 |
| Publication date | May 11, 2021 |
| Grant date | May 11, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: generating first voice profile data associated with a group profile identifier and a first user identifier; identifying a first stored representation of a first user input received after the first voice profile data is generated, the first stored representation being associated with a group profile identifier and the first user identifier; identifying a second stored representation of a second user input received after the first voice profile data is generated, the second stored representation being associated with the group profile identifier and the first user identifier; determining the first stored representation is similar to the second stored representation; generating updated first voice profile data using the first voice profile data, the first stored representation, and the second stored representation; and storing an association between the updated first voice profile data and the first user identifier. 2. The method of claim 1 , further comprising: determining a first number of stored representations used to generate the first voice profile data; determining a second number corresponding to an amount of stored representations, the second number representing the first number, the first stored representation, and the second stored representation; and based at least in part on the second number, generating the updated first voice profile data. 3. The method of claim 1 , further comprising: determining the first stored representation is associated with a first intent indicator; determining the second stored representation is associated with a second intent indicator; determining the first intent indicator is different from the second intent indicator; and based at least in part on determining the first intent indicator is different from the second intent indicator, generating the updated first voice profile data. 4. The method of claim 1 , further comprising: determining a third stored representation of a third user input; determining a fourth stored representation of a fourth user input; determining the third stored representation is similar to the fourth stored representation; determining the third stored representation and the fourth stored representation are dissimilar to the first voice profile data; and based at least in part on determining the third stored representation and the fourth stored representation are dissimilar to the first voice profile data, generating second voice profile data using the third stored representation and the fourth stored representation. 5. The method of claim 1 , wherein the first stored representation corresponds to first audio data, the second stored representation corresponds to second audio data, and wherein the method further comprises: generating a first user recognition feature vector representing the first audio data; generating a second user recognition feature vector representing the second audio data; and determining the first user recognition feature vector is similar to the second user recognition feature vector. 6. The method of claim 1 , wherein the first stored representation is a first user recognition feature vector, the second stored representation is a second user recognition feature vector, and the method further comprises: determining a distance between the first user recognition feature vector and the second user recognition feature vector; and based at least in part on the distance, determining the first user recognition feature vector is similar to the second user recognition feature vector. 7. A method, comprising: generating first voice profile data associated with a first user identifier; identifying a first stored representation of a first user input received after the first voice profile data is generated, the first stored representation being associated with the first user identifier; identifying a second stored representation of a second user input received after the first voice profile data is generated, the second stored representation being associated with the first user identifier; determining the first stored representation is similar to the second stored representation; generating updated first voice profile data using the first voice profile data, the first stored representation, and the second stored representation; and storing an association between the updated first voice profile data and the first user identifier. 8. The method of claim 7 , wherein the first stored representation corresponds to first audio data, the second stored representation corresponds to second audio data, and wherein the method further comprises: generating a first user recognition feature vector representing the first audio data; generating a second user recognition feature vector representing the second audio data; and determining the first user recognition feature vector is similar to the second user recognition feature vector. 9. The method of claim 7 , further comprising: determining a first number of stored representations used to generate the first voice profile data; determining a second number corresponding to an amount of stored representations, the second number representing the first number, the first stored representation, and the second stored representation; and based at least in part on the second number, generating the updated first voice profile data. 10. The method of claim 7 , further comprising: determining the first stored representation is associated with an intent indicator; and based at least in part on the first stored representation being associated with the intent indicator, generating the updated first voice profile data. 11. The method of claim 7 , further comprising: identifying a third stored representation of a third user input; determining the third stored representation is dissimilar to the first voice profile data; and generating second voice profile data using the third stored representation. 12. The method of claim 7 , further comprising: identifying the first stored representation based at least in part on the first stored representation being associated with a group profile identifier. 13. The method of claim 7 , wherein the first stored representation is a first user recognition feature vector, wherein the second stored representation is a second user recognition feature vector, and wherein the method further comprises: determining a distance between the first user recognition feature vector and the second user recognition feature vector; and based at least in part on the distance, determining the first user recognition feature vector is similar to the second user recognition feature vector. 14. The method of claim 7 , further comprising: receiving audio data representing a third user input; storing the audio data; after storing the audio data, receiving an indicator representing the audio data is to be associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and using the audio data, generating updated second voice profile data. 15. The method of claim 7 , further comprising: receiving first audio data representing a third user input; determining the first audio data corresponds to second voice profile data associated with a second user identifier; determining user profile data, associated with the second user identifier, corresponds to a user name; generating second audio data including the user name; causing a first device to output audio corresponding to the second audio data; receiving, from the first device, third a
Training, enrolment or model building · CPC title
Execution procedure of a spoken command · CPC title
Word spotting · CPC title
Speaker identification or verification techniques · CPC title
Sound input; Sound output (speech processing G10L) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.