Sound profile generation based on speech recognition results exceeding a threshold

US10074364B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10074364-B1
Application numberUS-201615085772-A
CountryUS
Kind codeB1
Filing dateMar 30, 2016
Priority dateFeb 2, 2016
Publication dateSep 11, 2018
Grant dateSep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating sound profiles of artificial commands detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system's resources.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, at an electronic device, audio data representing a phrase; generating text data representing the phrase by executing speech-to-text functionality; identifying a category that has been generated for the phrase, the category signifying that the text data represents the phrase; adding a count to the category to indicate that another instance of the category has been identified; determining a total number of counts for the category; determining, based on the total number of counts for the category, that multiple requesting devices have sent audio data representing the phrase to the electronic device during a same temporal window; based at least in part on a determination that multiple requesting devices have sent audio data representing the phrase to the electronic device during the same temporal window, generating an audio fingerprint corresponding to the audio data; storing the audio fingerprint on the electronic device; receiving additional audio data also representing the phrase; generating an additional audio fingerprint corresponding to the additional audio data; determining that a bit error rate of the additional audio fingerprint as compared to the audio fingerprint; determining that the bit error rate is less than a bit error rate threshold value indicating that the audio data and the additional audio data both represent the phrase; and based at least in part on a determination that the bit error rate is less than the bit error rate threshold value, refraining from performing at least some automatic speech recognition processing for the additional audio data. 2. The method of claim 1 , further comprising: receiving new audio data representing a different phrase; generating a new audio fingerprint corresponding to the new additional audio data; determining a new bit error rate of the new audio fingerprint as compared to the audio fingerprint; determining that new bit error rate is greater than the bit error rate threshold value that indicates that the new audio fingerprint and the audio fingerprint represent different phrases; and enabling speech recognition processing to proceed for the new audio data such that text data representing the different phrase is generated. 3. The method of claim 1 , further comprising: receiving new audio data representing the phrase; generating a candidate audio fingerprint corresponding to a beginning portion of the phrase; determining a new bit error rate of the candidate audio fingerprint as compared to an initial portion of the audio fingerprint; determining that the new bit error rate is greater than the error threshold value indicating that the new audio data and the audio data differ; generating a full audio fingerprint corresponding to the new audio data such that the full audio fingerprint represents an entirety of the phrase; determining a supplemental bit error rate of the full audio fingerprint as compared to the audio fingerprint; determining that the supplemental bit error rate is less than the error threshold value indicating that the new audio data and the audio data both represent the phrase; and based at least in part on a determination that the supplemental bit error rate is less than the bit error rate threshold value, refraining from performing at least some automatic speech recognition processing for the new audio data. 4. The method of claim 1 , further comprising: receiving new audio data representing the phrase; generating a candidate audio fingerprint corresponding to a beginning portion of the phrase; determining a new bit error rate of the candidate audio fingerprint as compared to an initial portion of the audio fingerprint; determining that the new bit error rate is less than the bit error threshold value indicating that the new audio data represents the phrase; and based at least in part on a determination that the new bit error rate is less than the bit error rate threshold value, prior to a full audio fingerprint corresponding to an entirety of the phrase being generated, refraining from performing at least some automatic speech recognition processing for the new audio data. 5. A method, comprising: receiving a first instance of audio data representing a first sound; determining that, within a temporal window, a plurality of additional instances of audio data representing the first sound are also received; determining a number of the instances of audio data representing the first sound that are received within the temporal window; determining that the number of the instances is greater than a threshold value; based at least in part on a determination that the number of the instances is greater than the threshold value, generating a first sound profile of the first sound; storing the first sound profile; receiving second audio data representing a second sound; generating a second sound profile of the second sound; determining that a similarity value of the second sound profile and the first sound profile is greater than a similarity threshold value; and based at least in part on a determination that the similarity value is greater than the similarity threshold value, refraining from performing at least some automated speech recognition processing for the second audio data. 6. The method of claim 5 , wherein the method is performed by at least one electronic device that is separate from one or more user devices that generate the first audio data and the second audio data. 7. The method of claim 5 , wherein determining the similarity value comprises: determining a bit error rate of the second sound profile as compared to the first sound profile; determining that the bit error rate is less than a bit rate threshold signifying a bit difference between the first sound profile and the second sound profile; and determining, based at least in part on the bit error rate value being less than the bit rate threshold, that the first sound profile and the second sound profile are substantially similar to one another. 8. The method of claim 5 , further comprising: receiving third audio data representing a third sound; generating a third sound profile of the third sound; determining that a second similarity value of the third sound profile and the first sound profile is less than the similarity threshold value; and enabling automated speech recognition processing to continue for the third audio data. 9. The method of claim 5 , further comprising: generating, prior to determining that the number of instances of audio data representing the first sound is greater than the threshold value, a first instance of text data representing the first sound; generating an additional plurality of instances of text data corresponding to the plurality of additional instances of audio data; determining a total number of counts corresponding to the first instance of the text data and the additional plurality of instances of the text data; and determining that the total number of counts occurring within the temporal window is greater than the threshold value. 10. The method of claim 5 , further comprising: determining that the first sound was produced by a media event; obtaining a total audio output of the media event; and generating a media event sound profile based on the total audio output. 11. The method of claim 10 , further comprising: receiving third audio data representing a third sound; generating a third sound profile of the third audio data; determining that a second similarity value of the third sound profile as compared to a first portion of the media event sound profile is greater than a media event similari

Assignees

Inventors

Classifications

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Indexing structures · CPC title

  • Memory allocation or algorithm optimisation to reduce hardware requirements · CPC title

  • Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10074364B1 cover?
Systems and methods for generating sound profiles of artificial commands detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be gene…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).