Altering audio to improve automatic speech recognition

US11488591B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11488591-B1
Application numberUS-201916510060-A
CountryUS
Kind codeB1
Filing dateJul 12, 2019
Priority dateSep 26, 2012
Publication dateNov 1, 2022
Grant dateNov 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user's subsequent command.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: at least one speaker; at least one microphone; one or more processors; and computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the device to perform operations, the operations comprising: causing the at least one speaker to output first content; receiving a first input audio signal generated by the at least one microphone based at least in part on first sound from a user, the first sound captured by the at least one microphone; determining predefined audio within the first input audio signal, the predefined audio comprising one or more words indicating that the user is going to provide a subsequent command to the device; altering output of the first content by the at least one speaker for a first period of time based at least in part on determining the predefined audio within the first input audio signal; receiving a second input audio signal generated by the at least one microphone based at least in part on second sound captured by the at least one microphone during at least a portion of the first period of time; determining a voice command in the second input audio signal; and causing, based at least in part on the voice command, the at least one speaker to output second content different from the first content for a second period of time that is after the first period of time. 2. The device of claim 1 , the operations further comprising: determining an identify of the user; and determining a user profile associated with the user. 3. The device of claim 2 , wherein the altering the output of the first audio content is based at least in part on the user profile. 4. The device of claim 2 , further comprising: a camera, wherein the determining the identity of the user is based at least in part on image data captured by the camera. 5. The device of claim 1 , wherein altering the output of the first content comprises lowering a volume at which the at least one speaker outputs the first content during the first period of time. 6. The device of claim 1 , wherein altering the output of the first content comprises stopping output of the first content for the first period of time. 7. The device of claim 1 , wherein altering the output of the first content comprises switching from outputting the first content in stereo to outputting the first content in mono for the first period of time. 8. The device of claim 1 , further comprising: a switch configurable in a first position that couples the at least one speaker to a power source and a second position that decouples the at least one speaker from the power source, wherein the operations further comprise configuring, based at least in part on determining the predefined audio, the switch in the second position. 9. The device of claim 1 , the operations further comprising: determining a type of the first content; wherein the altering the output of the first content comprises altering the output in a first manner based at least in part on the first content being a first type, and wherein the altering the output of the first content comprises altering the output in a second manner based at least in part on the first content being a second type. 10. The device of claim 1 , the operations further comprising: determining an audible response to the verbal command, wherein the second content includes the audible response to the verbal command. 11. A device comprising: at least one speaker; at least one microphone; one or more processors; and computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the device to perform operations, the operations comprising: causing the at least one speaker to output first content; receiving a first input audio signal generated by the at least one microphone based at least in part on first sound from a user, the first sound captured by the at least one microphone; determining predefined audio within the first input audio signal, the predefined audio comprising one or more words indicating that the user is going to provide a subsequent command to the device; determining a user profile associated with the user; altering, based at least in part on determining the predefined audio, output of the first content by the at least one speaker for a first period of time; receiving a second input audio signal generated by the at least one microphone based at least in part on second sound captured by the at least one microphone during at least a portion of the first period of time; determining a voice command in the second input audio signal; and causing, based at least in part on the voice command, the at least one speaker to output second content different from the first content for a second period of time that is after the first period of time. 12. The device of claim 11 , wherein the determining the user profile comprises comparing at least one of at least a portion of the first input audio signal or at least a portion of the voice command to a voice print associated with the user profile. 13. The device of claim 11 , wherein the altering the output is based at least in part on the user profile. 14. The device of claim 11 , further comprising: a camera; wherein the determining the user profile is based at least at least in part on image data captured by the camera. 15. The device of claim 14 , the operations further comprising: performing facial recognition techniques on the image data to identify the user, wherein the user profile is associated with the user. 16. The device of claim 11 , the operations further comprising: determining the second content based at least in part on the user profile. 17. The device of claim 11 , wherein altering the output of the first content comprises at least one of lowering a volume at which the at least one speaker outputs the first content during the first period of time or stopping output of the first content for the first time. 18. The device of claim 11 , wherein altering the output of the first content comprises switching from outputting the first content in stereo to outputting the first content in mono for the first period of time. 19. The device of claim 11 , the operations further comprising: determining a type of the first content; wherein the altering the output of the first content comprises altering the output in a first manner based at least in part on the first content being a first type, and wherein the altering the output of the first content comprises altering the output in a second manner based at least in part on the first content being a second type. 20. The device of claim 19 , the operations further comprising: determining an audible response to the verbal command, wherein the second content includes the audible response to the verbal command.

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Reproducing at a different information rate from the information rate of recording (for television signals H04N5/783) · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11488591B1 cover?
Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).