What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and devices for ignoring similar audio being received by a system

US9728188B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9728188-B1
Application number	US-201615195587-A
Country	US
Kind code	B1
Filing date	Jun 28, 2016
Priority date	Jun 28, 2016
Publication date	Aug 8, 2017
Grant date	Aug 8, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described herein. In some embodiments, a voice activated electronic device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, and may send audio data representing the sound to a backend system. Upon receipt, the backend system may, in parallel to performing automated speech recognition processing to the audio data, generate a sound profile of the audio data, and may compare that sound profile to sound profiles of recently received audio data and/or flagged sound profiles. If the generated sound profile is determined to match another sound profiles, then the automated speech recognition processing may be stopped, and the voice activated electronic device may be instructed to return to a keyword spotting mode. If the matching sound profile is not already stored in a database of known sound profiles, it can be stored for future comparisons.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, at a backend system, first audio data; receiving a first timestamp indicating a first time that the first audio data was sent to the backend system by a first user device; receiving, at the backend system, second audio data; receiving a second timestamp indicating a second time that the second audio data was sent to the backend system by a second user device; determining that an amount of time between the first time and the second time is less than a predetermined period of time, which indicates that the first audio data and the second audio data were sent at a substantially same time; generating a first audio fingerprint of the first audio data by performing a first fast Fourier transform (“FFT”) on the first audio data, the first audio fingerprint comprising first data representing a first time-frequency profile of the first audio data; generating a second audio fingerprint of the second audio data by performing a second FFT on the second audio data, the second audio fingerprint comprising second data representing a second time-frequency profile of the second audio data; determining a bit error rate between the first audio fingerprint and the second audio fingerprint by determining a number of different bits between the first audio fingerprint and the second audio fingerprint, and then dividing the number by a total number of bits; determining that the bit error rate is less than a predefined bit error rate threshold value indicating that the first audio data and the second audio data both represent a same sound; and storing the first audio fingerprint as a flagged audio fingerprint in memory on the backend system such that receipt of additional audio data that has a matching audio fingerprint is ignored by the backend system. 2. The method of claim 1 , further comprising: receiving, at the backend system, third audio data; generating a third audio fingerprint of the third audio data by performing a third FFT on the third audio data, the third audio fingerprint comprising third data representing a third time-frequency profile of the third audio data; determining an additional bit error rate between the third audio fingerprint and the flagged audio fingerprint; determining that the additional bit error rate is less than the predefined bit error rate threshold value indicating that the third audio data also represents the same sound; and causing the backend system to ignore the third audio data such that a response is not generated to respond to the third audio data. 3. The method of claim 1 , further comprising: receiving, at the backend system, third audio data; generating a third audio fingerprint of the third audio data by performing a third FFT on the third audio data, the third audio fingerprint comprising third data representing a third time-frequency profile of the third audio data; determining a new bit error rate between the third audio fingerprint and the flagged audio fingerprint; determining that the new bit error rate is greater than the predefined bit error rate threshold value indicating that third audio data does not represent the same sound; and generating text data representing the third audio data by executing speech-to-text functionality on the third audio data. 4. The method of claim 1 , further comprising: determining a first user identifier associated with the first user device; determining a second user identifier associated with the second user device; determining that the first user identifier is different than the second user identifier; generating a first instruction for the first user device that causes the first user device to return to a keyword spotting mode where the first user device will monitor sound signals received by a microphone for a subsequent utterance of a wakeword by continuously running the sound signals through a wakeword engine; generating a second instruction for the second user device that causes the second user device to return to the keyword spotting mode; sending the first instruction to the first user device; and sending the second instruction to the second user device. 5. The method of claim 1 , further comprising: causing automated speech recognition processing to stop being performed to the first audio data; and causing the automated speech recognition processing to stop being performed to the second audio data. 6. The method of claim 1 , further comprising: receiving, at the backend system, third audio data; receiving a third timestamp indicating a third time that the third audio data was sent to the backend system by a third user device; determining that an additional amount of time between the first time and the third time is greater than the predetermined period of time, which indicates that the first audio data and the third audio data were sent at a different time; generating a third audio fingerprint of the third audio data by performing a third FFT on the third audio data, the third audio fingerprint comprising third data representing a third time-frequency profile of the third audio data; determining a new bit error rate between the flagged audio fingerprint and the third audio fingerprint; determining that the new bit error rate is greater than the predefined bit error rate threshold value indicating that third audio data does not represent the same sound; receiving a first plurality of audio fingerprints corresponding to a second plurality of audio data that were received during the additional amount of time; determining a third plurality of bit error rates between the third audio fingerprint and each of the first plurality of audio fingerprints; determining that each of the third plurality of bit error rates are greater than the predefined bit error rate threshold value, indicating that each of the second plurality of audio data represent a different sound than the third audio data; and causing automated speech recognition processing to continue to be performed to the third audio data. 7. The method of claim 6 , further comprising: determining a new amount of time between the third time and a fourth time, the fourth time corresponding to a fourth audio fingerprint of fourth audio data received prior to the first audio data, the second audio data, and the third audio data; determining that the new amount of time is greater than the amount of time; determining that the new amount of time is greater than the additional amount of time; determining that the fourth audio fingerprint correspond to an oldest audio fingerprint of the plurality of audio fingerprints; causing the fourth audio fingerprint to be deleted; determining an updated first plurality of audio fingerprints comprising the first plurality of audio fingerprints minus the fourth audio fingerprint; and generating a fourth plurality of audio fingerprints comprising the updated first plurality of audio fingerprints and the third audio fingerprint. 8. The method of claim 1 , further comprising: receiving a third audio fingerprint of third audio data, wherein the first audio fingerprint is generated at a first speech processing component, and the third audio fingerprint is generated at a second speech processing component; causing the third audio fingerprint to be stored in the memory; determining an additional bit error rate between first audio fingerprint and the third audio fingerprint; determining that the additional bit error rate is less than the predefined bit error rate threshold value; and causing automated speech recognition processing to stop being performed to the third audio data. 9. The method of claim 1 , further comprising: receiving, at the backend system, third audio

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L2015/227
of the speaker; Human-factor methodology · CPC title
G10L15/285
Memory allocation or algorithm optimisation to reduce hardware requirements · CPC title
G10L15/08
Speech classification or search · CPC title
G10L2015/088
Word spotting · CPC title
G10L25/51
for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

View patent family 59410838

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9728188B1 cover?: Systems and methods for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described herein. In some embodiments, a voice activated electronic device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, a…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Hold back and real time ranking of results in a streaming matching system

Positioning using audio recognition

Methods and apparatus for identifying media

Finding Differences in Nearly-Identical Audio Recordings

Proximity discovery using audio signals

System and method for matching a query against a broadcast stream

Frequently asked questions