Neural network based beam selection
US-10134421-B1 · Nov 20, 2018 · US
US10546583B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10546583-B2 |
| Application number | US-201715691460-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 30, 2017 |
| Priority date | Aug 30, 2017 |
| Publication date | Jan 28, 2020 |
| Grant date | Jan 28, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes, in part, context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.
Opening claim text (preview).
What is claimed is: 1. A system comprising: one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first voice-enabled device, first audio data representing a speech utterance; receiving, from the first voice-enabled device, a first audio signal metric value indicating a first signal-to-noise ratio associated with the first audio data; receiving, from a second voice-enabled device, second audio data representing the speech utterance; receiving, from the second voice-enabled device, a second audio signal metric value indicating a second signal-to-noise ratio associated with the second audio data; determining that the first signal-to-noise ratio is greater than the second signal-to-noise ratio; identifying device state data associated with the first voice-enabled device; generating, using automatic speech recognition (ASR) on at least one of the first audio data or the second audio data, text data corresponding to the speech utterance; determining, using natural language understanding (NLU) on the text data, intent data associated with the speech utterance, the intent data representing a request for a client device to perform an action; determining, based at least in part on the device state data, that the first voice-enabled device is capable of performing the action responsive to the speech utterance; determining a command to cause the first voice-enabled device to perform the action; and sending, to the first voice-enabled device, data indicating the command. 2. The system of claim 1 , the operations further comprising causing the second voice-enabled device to stop transmitting the second audio data, the second voice-enabled device being stopped from transmitting the second audio data prior to the first voice-enabled device stopping transmitting the first audio data, wherein generating the text data is performed using ASR on the first audio data. 3. The system of claim 1 , the operations further comprising: determining that the first voice-enabled device is included in a stored grouping of devices that includes the first voice-enabled device and a third voice-enabled device; identifying device state data associated with the stored grouping of devices; and determining that the stored grouping of devices is capable of performing the action responsive to the speech utterance. 4. The system of claim 1 , wherein identifying the device state data associated with the first voice-enabled device comprises: sending a request to an event component to provide an indication of the device state data associated with the first voice-enabled device; and receiving, from the event component, the device state data. 5. A system comprising: one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving a first device identifier of a first device; receiving first audio data associated with the first device identifier, the first audio data representing a speech utterance; receiving a second device identifier of a second device; receiving second audio data associated with the second device identifier, the second audio data representing the speech utterance; determining intent data associated with the speech utterance, the intent data representing a machine response for a device to perform responsive to the speech utterance; identifying first device state data associated with the first device; identifying second device state data associated with the second device; based at least in part on the second device state data, determining the second device is to be used for the machine response; determining command data to cause the second device to perform the machine response; and sending, to the second device, the command data to perform the machine response. 6. The system of claim 5 , further comprising determining, based on the first device state data, that the first device is offline. 7. The system of claim 5 , the operations further comprising: determining that the first device is included in a stored grouping of devices that includes the first device and a third device; identifying device state data associated with the stored grouping of devices; and determining, based on the device state data associated with the stored grouping of devices, that the stored grouping of devices is offline. 8. The system of claim 5 , the operations further comprising: determining that the first device is associated with a secondary device; identifying third device state data associated with the secondary device; and determining, based on the third device state data, that the secondary device is offline. 9. The system of claim 5 , the operations further comprising: determining, based on the first device state data, that the first device is offline; and storing an indication that the second device is to perform the machine response. 10. The system of claim 5 , the operations further comprising receiving an indication that the first device is ranked higher than the second device based at least in part on a first audio signal metric associated with the first audio data and a second audio signal metric associated with the second audio data. 11. The system of claim 10 , wherein: the first audio signal metric associated with the first audio data comprises at least one of: a first signal-to-noise value of the first audio data; a first amplitude of the first audio data; or a first level of voice activity in the first audio data; and the second audio signal metric associated with the second audio data comprises at least one of: a second signal-to-noise value of the second audio data; a second amplitude of the second audio data; or a second level of voice activity in the second audio data. 12. The system of claim 5 , the operations further comprising receiving an indication that the first device is ranked higher than the second device, wherein the first device and the second device are ranked based on one or more of: input received via an input control of the first device; a distance of a user to the first device; or image data indicating that the user is at least partially facing the first device. 13. A method comprising: receiving first audio data associated with a first device, the first audio data representing a speech utterance; receiving second audio data associated with a second device, the second audio data representing the speech utterance; identifying first device state data associated with the first device; identifying second device state data associated with the second device; determining intent data associated with the speech utterance, the intent data representing a machine response for a device to perform responsive to the speech utterance; based at least in part on the second device state data, determining the second device is to be used for the machine response; determining command data to cause the second device to perform the machine response; and sending, to the second device, the command data to perform the machine response. 14. The method of claim 13 , further comprising determining, based on the first device state data, that the first device is offline. 15. The method of claim 13 , further comprising: determining that the first device is included in a stored grouping of devices that includes the first device and a third device; id
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
for discriminating voice from noise · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Constructional details of speech recognition systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.