Dynamic adjustment of expression detection criteria

US9940949B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9940949-B1
Application numberUS-201414578097-A
CountryUS
Kind codeB1
Filing dateDec 19, 2014
Priority dateDec 19, 2014
Publication dateApr 10, 2018
Grant dateApr 10, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a speech-based system, a wake word or other trigger expression is used to preface user speech that is intended as a command. The system receives multiple directional audio signals, each of which emphasizes sound from a different direction. The trigger expression is detected in an individual directional audio signal by comparing a confidence score with a confidence threshold. An individual confidence threshold is specified for each directional audio signal. The confidence thresholds are adjusted during operation of the system based on performance information that is generated during operation of the system. As an example, performance information may include the number of times that the trigger expression has been detected in each of the directional audio signals.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising: a microphone array including a first microphone and a second microphone, the first microphone configured to generate a first audio signal and the second microphone configured to generate a second audio signal; an audio beamformer configured to process the first audio signal and the second audio signal to produce a first directional audio signal and a second directional audio signal, wherein the first directional audio signal emphasizes sound from a first direction and the second directional audio signal emphasizes sound from a second direction; non-transitory computer-readable medium having computer-executable instructions stored thereupon which, when executed by a computer, perform operations comprising: analyzing the first directional audio signal to produce a first score indicating a likelihood that a trigger expression is represented in the first directional audio signal; comparing the first score to a first threshold to detect that first audio from the first direction includes the trigger expression; analyzing the second directional audio signal to produce a second score indicating a likelihood that the trigger expression is represented in the second directional audio signal; and comparing the second score to a second threshold to detect that second audio from the second direction includes the trigger expression; and control logic configured to perform operations comprising: decreasing the first threshold or increasing the second threshold based at least in part on an increase in a detection ratio, the detection ratio comprising a ratio between a first number of times that the trigger expression is detected from the first direction and a second number of times that the trigger expression is detected from the second direction, or increasing the first threshold or decreasing the second threshold based at least in part on a decrease in the detection ratio; and causing, based at least in part on at least one of the first directional audio signal or the second directional audio signal including the trigger expression, the system to transition from a first mode to a second mode, wherein the first mode has a lower power consumption than the second mode. 2. The system of claim 1 , further comprising: a speech activity detector configured to detect a representation of human speech in the first directional audio signal, the speech activity detector having a sensitivity that is adjusted by changing an activity threshold; wherein the computer-executable instructions, when executed by the computer, further perform an operation comprising analyzing the first directional audio signal based at least in part on detection of the representation of human speech in the first directional audio signal; and wherein the control logic is further configured to perform operations comprising: increasing, based at least in part on a decrease in the detection ratio, the activity threshold, or decreasing, based at least in part on an increase in the detection ratio, the activity threshold. 3. The system of claim 1 , further comprising: a speech recognition component configured to identify speech represented by the first directional audio signal; and wherein the control logic is further configured to perform operations comprising: determining that the trigger expression is absent from the recognized speech, and decreasing the first threshold or increasing the second threshold based at least in part on determining that the trigger expression is absent from the recognized speech. 4. The system of claim 1 , further comprising: a speech recognition component configured to identify speech that is represented by the first directional audio signal; a natural language understanding component configured to determine that a meaning is not associated with the speech; and wherein the control logic is further configured to perform operations comprising decreasing the first threshold or increasing the second threshold. 5. A device, comprising: an audio beamformer configured to produce a first directional audio signal and a second directional audio signal, the first directional audio signal emphasizing sound from a first direction and the second directional audio signal emphasizing sound from a second direction; non-transitory computer-readable medium having computer-executable instructions stored thereupon which, when executed by a computer, perform operations comprising: analyzing, based at least in part on a first threshold, the first directional audio signal to determine whether first audio from the first direction includes a trigger expression; and analyzing, based at least in part on a second threshold, the second directional audio signal to determine whether second audio from the second direction includes the trigger expression; and control logic configured to: adjust at least one of the first threshold or the second threshold based at least in part on a first number of times that the trigger expression is detected from the first audio and a second number of times that the trigger expression is detected from the second audio, and cause, based at least in part on at least one of the first directional audio signal or the second direction audio signal including the trigger expression, the device to transition from a first mode to a second mode, wherein the first mode has a lower power consumption than the second mode. 6. The device of claim 5 , wherein the first number of times is greater than the second number of times, and wherein the control logic is further configured to adjust at least one of the first threshold or the second threshold such that the first threshold is less than the second threshold. 7. The device of claim 5 , further comprising: a machine vision component configured to detect a presence of a person in the first direction, and wherein the control logic is further configured to decrease the first threshold or increase the second threshold. 8. The device of claim 5 , further comprising: a machine vision component configured to detect a presence of a non-human source of sound in the first direction, and wherein the control logic is further configured to decrease the first threshold or increase the second threshold. 9. The device of claim 5 , wherein the control logic is further configured to: send, based at least in part on detecting a presence of the trigger expression in the first directional audio signal, the first directional audio signal to a speech service; receive, from the speech service, an indication that the trigger expression is not represented in the first directional audio signal; and increase the first threshold or decrease the second threshold. 10. The device of claim 5 , wherein the first directional audio signal contains a representation of speech, and wherein the control logic is further configured to: send the first directional audio signal to a speech service to determine a meaning of the of the speech; receive an indication from the speech service that the speech does not correspond to an understood meaning; and increase the first threshold or decrease the second threshold. 11. The device of claim 5 , further comprising: a speech activity detector configured to detect a representation of human speech in the first directional audio signal, the speech activity detector having an adjustable sensitivity; and wherein the control logic is further configured to adjust the sensitivity of the speech activity detector based at least in part on the first number of times that the trigger expression is detected from the first audio and the second number of times that the trigger expression is detecte

Assignees

Inventors

Classifications

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • G10L25/78Primary

    Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Adaptive threshold · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9940949B1 cover?
In a speech-based system, a wake word or other trigger expression is used to preface user speech that is intended as a command. The system receives multiple directional audio signals, each of which emphasizes sound from a different direction. The trigger expression is detected in an individual directional audio signal by comparing a confidence score with a confidence threshold. An individual co…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).