Arbitration between voice-enabled devices
US-2017076720-A1 · Mar 16, 2017 · US
US9734822B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9734822-B1 |
| Application number | US-201514727504-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 1, 2015 |
| Priority date | Jun 1, 2015 |
| Publication date | Aug 15, 2017 |
| Grant date | Aug 15, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Features are disclosed for improving the accuracy and stability of beamformed signal selection. The selection may consider processing feedback information to identify when the current beam selection may need to be re-evaluated. The feedback information may further be used to select a beamformed signal for processing. For example, beams which detect wake-words or yield high confidence speech recognition may be favored over beams which fail to detect or recognize at a lower confidence level.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a microphone array including a plurality of microphones, the microphone array configured to generate a plurality of input audio signals representing sound detected by the microphone array; one or more processors in communication with the microphone array, the one or more processors configured to: generate a first beamformed audio signal using at least two of the plurality of input audio signals by adjusting at least one of a phase or an amplitude of at least one of the at least two of the plurality of input audio signals, the first beamformed audio signal associated with a first beam having a first direction; generate a second beamformed audio signal using at least two of the plurality of input audio signals by adjusting at least one of a phase or an amplitude of at least one of the at least two of the plurality of input audio signals, the second beamformed audio signal associated with a second beam having a second direction; select the first beamformed audio signal for speech processing by a speech processing component; send the first beamformed audio signal to the speech processing component, wherein the speech processing component includes at least one of a wake-word engine or a speech recognition engine; receive feedback information from the speech processing component for the first beamformed audio signal, wherein the feedback information includes at least one of: a detection result for a wake word in the first beamformed audio signal from the wake-word engine; or a speech recognition confidence for speech content recognized in the first beamformed audio signal from the speech recognition engine; select the second beamformed audio signal for speech processing by the speech processing component in response to: the detection result from the wake-word engine indicating a failure to detect the wake word in the first beamformed audio signal; or the speech recognition confidence from the speech recognition engine being less than a minimum beam recognition confidence for the speech content recognized in the first beamformed audio signal; and send the second beamformed audio signal to the speech processing component. 2. The apparatus of claim 1 , wherein the one or more processors are configured to receive second feedback information for the second beamformed audio signal, wherein the second beamformed audio signal is selected using a comparison of the feedback information for the first beamformed audio signal and the second feedback information for the second beamformed audio signal, wherein the second feedback information indicates detection of the wake-word and the second feedback information indicates a second speech recognition confidence which exceeds the speech recognition confidence for the speech content recognized in the first beamformed audio signal. 3. The apparatus of claim 1 , further comprising a memory storing beam selection information for a plurality of beams, the plurality of beams including the first beam and the second beam, the beam selection information indicating a number of times each respective beam was associated with feedback information indicating at least one of: detection of the wake-word in a signal from the respective beam by the wake-word engine; recognition of the wake-word in a signal from the respective beam by the speech recognition engine; or initiation of a system action in response to a signal from the respective beam, wherein the one or more processors are further configured to select the second beamformed audio signal using a comparison of the beam selection information for the first beam and the second beam. 4. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, receiving a plurality of beamformed audio signals for a sound, each of the plurality of beamformed audio signals corresponding to a direction, and each of the plurality of beamformed audio signals formed from at least two different audio signals; selecting a first beamformed audio signal from the plurality of beamformed audio signals for speech processing, the first beamformed audio signal associated with a first beam having a first direction; sending the first beamformed audio signal to a speech processing component configured to identify speech content represented in the first beamformed audio signal; receiving feedback information from the speech processing component, the feedback information associated with the speech content represented in the first beamformed signal, wherein the feedback information indicates at least one of: whether a wake-word is detected in the first beamformed audio signal, or a speech recognition confidence for speech content represented in the first beamformed audio signal; determining that beamformed audio signal selection is to be performed based at least in part on the feedback information indicating at least one of: a failure to detect the wake-word, or the speech recognition confidence being less than a minimum confidence threshold; selecting, in response to determining that the beamformed audio signal selection is to be performed, a second beamformed audio signal from the plurality of beamformed audio signals for the speech processing, wherein the second beamformed audio signal is associated with a second beam having a second direction; and sending the second beamformed audio signal to the speech processing component to identify speech content represented in the second beamformed audio signal. 5. The computer-implemented method of claim 4 , wherein the speech processing component comprises a wake-word detector, the wake-word detector configured to: detect a word included in the first beamformed audio signal; and send the feedback information for the first beamformed audio signal, the feedback information including a detection result indicating whether the word was detected in the first beamformed audio signal. 6. The computer-implemented method of claim 5 , wherein the wake-word detector is further configured to: transmit the first beamformed audio signal to an automatic speech recognition engine, the automatic speech recognition engine configured to: generate a recognition response including a word recognized from the first beamformed audio signal and a recognition confidence for the word; and send recognition processing feedback for the first beamformed audio signal, the recognition processing feedback including a speech recognition confidence for speech recognized in the first beamformed audio signal; and receive the recognition processing feedback, wherein the feedback information includes the recognition processing feedback. 7. The computer-implemented method of claim 4 , wherein the speech processing component comprises an automatic speech recognition engine, the automatic speech recognition engine configured to: recognize a word represented in the first beamformed audio signal; and send the feedback information for the first beamformed audio signal, the feedback information including a speech recognition confidence for the word recognized in the first beamformed audio signal. 8. The computer-implemented method of claim 4 , further comprising: providing the second beamformed audio signal to the speech processing component; and receiving second feedback information for the second beamformed audio signal, wherein selecting the second beamformed audio signal includes comparing the feedback information and the second feedback information. 9. The computer-implemented method of claim 8 , further comprising storing the feedback information for respective signals from the first beam and the second beam, said respec
Interactive procedures; Man-machine interfaces · CPC title
Mouthpieces; {Microphones;} Attachments therefor · CPC title
Signal processing covered by H04R, not provided for in its groups · CPC title
Speech classification or search · CPC title
Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.