Information processing apparatus, information processing system, and information processing method, and program
US-2019371296-A1 · Dec 5, 2019 · US
US11200884B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11200884-B1 |
| Application number | US-201816181925-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 6, 2018 |
| Priority date | Nov 6, 2018 |
| Publication date | Dec 14, 2021 |
| Grant date | Dec 14, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for labeling user inputs for updating user recognition voice profiles are described. A system may leverage various signals, generated during or after processing of a user input, to retroactively determine which user spoke the user input. For example, after the system receives the user input, the user may provide the system with non-spoken user verification information. Based on such user verification information, the system may label the previously spoken user input as originating from the particular user. The system may also or alternatively use system usage history to retroactively label user inputs.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, from a first device corresponding to a first device identifier, first audio data representing a first user input; determining a group profile identifier associated with the first device identifier; storing first data associating the group profile identifier with the first audio data; causing the first device to output content requesting user identifying information; receiving, from the first device, first data representing a second user input; determining the second user input includes first user identifying information; determining the first user identifying information is associated with a first user identifier; storing second data associating the first audio data with the first user identifier; identifying first voice profile data associated with the first user identifier; and generating updated first voice profile data using the first audio data. 2. The method of claim 1 , further comprising: receiving, from the first device, second audio data representing a third user input; sending the second audio data to a second device; receiving, from the second device, second data representing a fourth user input; determining the fourth user input represents the third user input is to be associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and generating updated second voice profile data using the second audio data. 3. The method of claim 1 , further comprising: receiving, from the first device, second audio data representing a third user input; performing speech processing on the second audio data to determine natural language understanding (NLU) results data including an intent indicator; determining usage history data associated with the group profile identifier; determining the usage history data represents the intent indicator is associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and generating updated second voice profile data using the second audio data. 4. The method of claim 1 , further comprising: receiving, from the first device, second audio data representing a third user input; determining the second audio data corresponds to second voice profile data associated with a second user identifier; determining user profile data associated with the second user identifier; determining the user profile data indicates a user name; generating third audio data including the user name; sending the third audio data to the first device; receiving, from the first device, second data representing the user name is incorrect; and based at least in part on the second data representing the user name is incorrect, using the second audio data as a negative utterance with respect to the second voice profile data. 5. A method, comprising: receiving first audio data representing a first user input; determining a first intent indicator representing the first user input; outputting, based at least in part on the first intent indicator, a request for user identifying information to be used to perform an action at least partially responsive to the first user input; receiving, after outputting the request, a second user input indicating first user identifying information; determining the first user identifying information is associated with a first user identifier corresponding to a single user; identifying first voice profile data associated with the first user identifier, the first voice profile data being usable to perform user recognition processing; and generating updated first voice profile data using the first audio data. 6. The method of claim 5 , further comprising: receiving, from a first device, second audio data representing a third user input; sending the second audio data to a second device; receiving, from the second device, first data representing a fourth user input; determining the fourth user input represents the third user input is to be associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and generating updated second voice profile data using the second audio data. 7. The method of claim 5 , further comprising: receiving second audio data representing a third user input; determining a second intent indicator representing the third user input; determining usage history data represents the second intent indicator is associated with a second user identifier; identifying second voice profile data associated with the second user identifier; and generating updated second voice profile data using the second audio data. 8. The method of claim 5 , further comprising: receiving second audio data representing a third user input; processing the second audio data to determine first natural language understanding (NLU) results data representing the third user input; sending the first NLU results data to a first component configured to process NLU results data; receiving, from the first component, a second user identifier associated with the first NLU results data; identifying second voice profile data associated with the second user identifier; and generating updated second voice profile data using the second audio data. 9. The method of claim 5 , further comprising: determining a first number of user recognition feature vectors used to generate the first voice profile data; determining a second number of user recognition feature vectors representing the first number of user recognition feature vectors and the first audio data; and generating, based at least in part on the second number of user recognition feature vectors, the updated first voice profile data using the first audio data. 10. The method of claim 5 , further comprising: determining user recognition feature vectors used to generate the first voice profile data; determining intent indicators associated with the user recognition feature vectors; determining the first intent indicator is unrepresented in the intent indicators; and generating, based at least in part on determining the first intent indicator is unrepresented in the intent indicators, the updated first voice profile data using the first audio data. 11. The method of claim 5 , further comprising: receiving the first audio data from a first device; determining a device identifier corresponding to the first device; determining a group profile identifier associated with the device identifier; and storing first data associating the first audio data with the group profile identifier. 12. A system, comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive first audio data representing a first user input; determine a first intent indicator representing the first user input; determine usage history data representing the first intent indicator is associated with a first user identifier; identify first voice profile data associated with the first user identifier, the first voice profile data being usable to perform user recognition processing; and generate updated first voice profile data using the first audio data. 13. The system of claim 12 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive second audio data representing a second user input; determine a second intent indicator representing the second user input; output, based at least in part on the second intent
Training, enrolment or model building · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.