Providing an ambient assist mode for computing devices
US-2019013025-A1 · Jan 10, 2019 · US
US10685669B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10685669-B1 |
| Application number | US-201815926507-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 20, 2018 |
| Priority date | Mar 20, 2018 |
| Publication date | Jun 16, 2020 |
| Grant date | Jun 16, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes techniques for identifying a voice-enabled device from a group of voice-enabled devices to respond to a speech utterance of a user. A speech-processing system may receive an audio signal representing the speech utterance captured in an environment of a voice-enabled device, and identify another voice-enabled device located in the environment. The system may analyze the audio signal using a different natural-language-understanding model for each of the voice-enabled devices to identify an intent for each of the voice-enabled devices to respond to the speech utterance. The system may determine confidence scores that the intents are responsive to the speech utterance, and select the intent with the highest confidence score. The system may use the selected intent to generate a command for the corresponding voice-enabled device to respond to the user.
Opening claim text (preview).
What is claimed is: 1. A system comprising: one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first voice-enabled device, audio data representing a speech utterance; generating, using automatic speech recognition (ASR) processing and the audio data, text data representing the speech utterance; determining a first device profile of the first voice-enabled device; determining the first voice-enabled device and a second voice-enabled device are located in a same physical environment; determining a second device profile of the second voice-enabled device; determining, using a first natural-language-understanding (NLU) model and the text data, first intent data representing the speech utterance, wherein the first NLU model is associated with the first device profile; determining, using a second NLU model and the text data, second intent data representing the speech utterance, wherein the second NLU model is associated with the second device profile; determining a first confidence score that the speech utterance corresponds to the first intent data; determining a second confidence score that the speech utterance corresponds to the second intent data; determining that the second confidence score is greater than the first confidence score; based at least in part on the second confidence score being greater than the first confidence score, using the second intent data to determine a command to cause the second voice-enabled device to perform an action; and sending, to the second voice-enabled device, command data indicating the command. 2. The system of claim 1 , the operations further comprising: identifying first device-state data associated with the first voice-enabled device, wherein the first device-state data indicates that a first device state of the first voice-enabled device is idle; identifying second device-state data associated with the second voice-enabled device, wherein the second device-state data indicates that a second device state of the second voice-enabled device is outputting sound using a speaker associated with the second voice-enabled device, and wherein: determining the first confidence score includes determining that the first intent data corresponds to a first action that the first voice-enabled device is unable to perform in the first device state; and determining the second confidence score includes determining that the second intent data corresponds to a second action that the second voice-enabled device is able to perform in the second device state. 3. The system of claim 1 , wherein: the first NLU model comprises a first machine-learning model trained to determine that the first intent data corresponds to the text data, wherein the first intent data is associated with a first device capability of the first voice-enabled device; and the second NLU model comprises a second machine-learning model trained to determine that the second intent data corresponds to the text data, wherein the second intent data is associated with a second device capability of the second voice-enabled device, and wherein the first device capability is different than the second device capability. 4. The system of claim 1 , wherein the audio data comprises first audio data, and the operations further comprising, prior to receiving the first audio data: receiving, from the first voice-enabled device, second audio data representing first sound captured by one or more microphones of the first voice-enabled device; receiving, from the second voice-enabled device, third audio data representing second sound captured by one or more microphones of the second voice-enabled device; determining the second audio data was received within a threshold period of time of when the third audio data was received; and based at least in part on the second audio data and the third audio being received within the threshold period of time, generating an association between the first device profile and the second device profile indicating that the first voice-enabled device is in the same physical environment as the second voice-enabled device. 5. A method comprising: receiving audio data from a first device in an environment, the audio data representing a speech utterance; generating, using automatic speech recognition (ASR) processing and the audio data, text data representing the speech utterance; determining that a second device is in the environment; determining, using a first natural-language-understanding (NLU) model and the text data, first intent data representing the speech utterance, wherein the first NLU model is associated with the first device; determining, using a second NLU model and the text data, second intent data representing the speech utterance, wherein the second NLU model is associated with the second device; selecting the second intent data instead of the first intent data; using the second intent data to determine a command to cause the second device to perform an action; and sending, to the second device, command data indicating the command. 6. The method of claim 5 , further comprising: identifying first device-state data associated with the first device, wherein the first device-state data indicates a first device state of the first device; determining a first confidence score that the speech utterance corresponds to the first intent data by determining that the first intent data corresponds to a first action that the first device is unable to perform in the first device state; identifying second device-state data associated with the second device, wherein the second device-state data indicates a second device state of the second device; determining a second confidence score that the speech utterance corresponds to the second intent data by determining that the second intent data corresponds to a second action that the second device is able to perform in the second device state; and determining that the second confidence score is greater than the first confidence score. 7. The method of claim 5 , wherein the audio data comprises first audio data, and the method further comprising: receiving second audio data associated with the second device, the second audio data representing the speech utterance; determining a first signal-to-noise (SNR) value associated with the first audio data; determining a second SNR value associated with the second audio data; and determining a first confidence score that the speech utterance is better represented by the first intent data based at least in part on the first SNR value; determining a second confidence score that the speech utterance is better represented by the second intent data based at least in part on the second SNR value; and determining that the second confidence score is greater than the first confidence score. 8. The method of claim 5 , further comprising: identifying a first device profile associated with the first device; determining that the first device profile is associated with the first NLU model, wherein: the first NLU model comprises a first machine-learning model trained to determine that the first intent data corresponds to the text data; and the first intent data is associated with a first device capability of the first device; identifying a second device profile associated with the second device; and determining that the second device profile is associated with the second NLU model, wherein: the second NLU model comprises a second machine-learning model trained to determine that the second intent data corresponds to the text data; and the second intent
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
for comparison or discrimination · CPC title
the extracted parameters being power information · CPC title
characterised by the type of extracted parameters · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.