Network microphone device with command keyword conditioning

US11501773B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11501773-B2
Application numberUS-202016812758-A
CountryUS
Kind codeB2
Filing dateMar 9, 2020
Priority dateJun 12, 2019
Publication dateNov 15, 2022
Grant dateNov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one aspect, a playback device includes a voice assistant service (VAS) wake-word engine and a command keyword engine. The playback device detects, via the command keyword engine, a first command keyword, and determines whether one or more playback conditions corresponding to the first command keyword are satisfied. Based on (a) detecting the first command keyword and (b) determining that the one or more playback conditions corresponding to the first command keyword are satisfied, the playback device playback device performs a first playback command corresponding to the first command keyword. When the playback device detects, via the wake-word engine, a wake-word in voice input, the playback device streams sound data corresponding to at least a portion of the voice input to one or more remote servers associated with the VAS.

First claim

Opening claim text (preview).

The invention claimed is: 1. A playback device comprising: a network interface; one or more microphones configured to detect sound; at least one speaker; one or more processors; data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions comprising: monitoring an input sound-data stream representing the sound detected by the one or more microphones for (i) a wake-word event and (ii) a media playback system keyword event; detecting a first media playback system keyword event, wherein detecting the first media playback system keyword event comprises after detecting a first sound via the one or more microphones, determining, with at least a threshold confidence, that the detected first sound includes a first media playback system keyword, wherein the first media playback system keyword is one of a plurality of command keywords supported by the playback device; in response to detecting the first media playback system keyword event, processing, via a local voice input engine of a media playback system voice assistant, the first sound as a first voice input, wherein processing the first sound comprises: (i) determining that one or more media playback system keyword conditions corresponding to the first media playback system keyword are satisfied; and (ii) determining that the local voice input engine is unable to determine an intent of the first voice input, wherein determining that the local voice input engine is unable to determine the intent of the first voice input comprises determining that one or more parameter slots associated with the first media playback system keyword are not matched with keywords in the first voice input; based on (a) detecting the first media playback system keyword event and (b) determining that local voice input engine is unable to determine the intent of the first voice input, sending, via the network interface, sound data corresponding to at least a portion of the first voice input to one or more servers of the media playback system voice assistant for processing of the first voice input; after receiving data indicating one or more first playback operations according to an intent of the first voice input as determined by the one or more servers of the media playback system voice assistant, performing the one or more first playback operations; detecting a first wake-word event, wherein detecting the first wake-word event comprises after detecting a second sound via the one or more microphones, determining that the detected second sound includes a second voice input comprising a first wake word; and in response to detecting the first wake-word event, streaming, via the network interface, sound data corresponding to at least a portion of the second voice input to one or more remote servers of a first voice assistant service. 2. The playback device of claim 1 , wherein the functions further comprise: detecting a second media playback system keyword event, wherein detecting the second media playback system keyword event comprises after detecting a third sound via the one or more microphones, determining, with at least a threshold confidence, that the detected third sound includes a second media playback system keyword, wherein the second media playback system keyword is one of the plurality of command keywords supported by the playback device; in response to detecting the first media playback system keyword event, processing, via the local voice input engine of the media playback system voice assistant, the third sound as a third voice input, wherein processing the third sound comprises: (i) determining that one or more media playback system keyword conditions corresponding to the second media playback system keyword are satisfied; and (ii) determining an intent of the third voice input, wherein determining the intent of the third voice input comprises matching parameter slots associated with the second media playback system keyword to keywords in the third voice input; and based on (a) detecting the second media playback system keyword event and (b) determining the intent of the third voice input, performing one or more second playback operations according to the determined intent of the third voice input. 3. The playback device of claim 1 , wherein the functions further comprise: in response to (a) detecting the first media playback system keyword event and (b) determining that one or more media playback system keyword conditions corresponding to the first media playback system keyword are satisfied, outputting audible feedback that the first media playback system keyword event was detected, wherein the playback device forgoes outputting of the audible feedback when at least one of the one or more media playback system keyword conditions corresponding to the first media playback system keyword are not satisfied. 4. The playback device of claim 1 , wherein the functions further comprise: detecting a second wake-word event, wherein detecting the second wake-word event comprises after detecting a fourth sound via the one or more microphones, determining that the detected second sound includes a fourth voice input comprising a second wake word that is different than the first wake word; and in response to detecting the second wake-word event, streaming, via the network interface, sound data corresponding to at least a portion of the fourth voice input to one or more remote servers of a second voice assistant service. 5. The playback device of claim 1 , wherein the functions further comprise: detecting a third media playback system keyword event, wherein detecting the third media playback system keyword event comprises after detecting the second sound via the one or more microphones, determining, with at least a threshold confidence, that the detected second sound includes a third media playback system keyword, wherein the third media playback system keyword is one of the plurality of command keywords supported by the playback device; and in response to detecting the first wake-word event, foregoing processing, via the local voice input engine of the media playback system voice assistant, the second sound. 6. The playback device of claim 1 , wherein the functions further comprise: detecting a fourth media playback system keyword event, wherein detecting the fourth media playback system keyword event comprises after detecting a fifth sound via the one or more microphones, determining, with at least a threshold confidence, that the detected first sound includes a fourth media playback system keyword, wherein the fourth media playback system keyword is one of the plurality of command keywords supported by the playback device; and in response to detecting the fourth media playback system keyword event, processing, via the local voice input engine of the media playback system voice assistant, the fifth sound as a fifth voice input, wherein processing the fifth sound comprises: (i) determining that at least one or more media playback system keyword conditions corresponding to the fourth media playback system keyword are not satisfied; and (ii) in response to determining that at least one or more media playback system keyword conditions corresponding to the fourth media playback system keyword are not satisfied, foregoing further processing of the fifth sound. 7. The playback device of claim 6 , wherein determining that at least one or more media playback system keyword conditions corresponding to the fourth media playback system keyword are not satisfied comprises: determining that voice activity is present in an environment comprising the playback device, wherein the absence of voice activity in the environment is one of the one or more media play

Assignees

Inventors

Classifications

  • Word spotting · CPC title

  • G06F3/165Primary

    Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Speech classification or search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501773B2 cover?
In one aspect, a playback device includes a voice assistant service (VAS) wake-word engine and a command keyword engine. The playback device detects, via the command keyword engine, a first command keyword, and determines whether one or more playback conditions corresponding to the first command keyword are satisfied. Based on (a) detecting the first command keyword and (b) determining that the…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/165. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).