Network microphone device with command keyword eventing

US11854547B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11854547-B2
Application numberUS-202117549034-A
CountryUS
Kind codeB2
Filing dateDec 13, 2021
Priority dateJun 12, 2019
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one aspect, a playback device includes a voice assistant service (VAS) wake-word engine and a command keyword engine. The playback device detects, via the command keyword engine, a first command keyword of in voice input of sound detected by one or more microphones of the playback device. The playback device determines an intent based on at least one keyword in the voice input via a local natural language unit (NLU). After detecting the first command keyword event and determining the intent, the playback device performs a first playback command corresponding to the first command keyword and according to the determined intent. When the playback device detects, via the wake-word engine, a wake-word in voice input, the playback device streams sound data corresponding to at least a portion of the voice input to one or more remote servers associated with the VAS.

First claim

Opening claim text (preview).

The invention claimed is: 1. A playback device comprising: a network interface; at least one microphone configured to detect sound; at least one speaker; at least one processor; and a housing carrying the network interface, the at least one microphone, the at least one speaker; the at least one processor, and data storage including instructions that are executable by the at least one processor such that the playback device is configured to: capture, via the at least one microphone, at least one input data stream; detect a wake word in a first portion of the at least one input data stream; based on detection of the wake word, trigger a wake-word event based on a first voice input captured via the at least one microphone, wherein the first voice input comprises the wake word and an utterance, and wherein the wake word does not correspond to a command; stream, via the network interface, sound data representing at least a portion of the first voice input to one or more remote servers of a voice assistant service for remote processing via a voice assistant of the one or more remote servers; after the first voice input is processed, a first command keyword in a second portion of the at least one input data stream, wherein the first command keyword is preceded in the at least one input data stream by a period of inactivity that excludes the wake word; based on detection of the first command keyword, trigger a first command keyword event to locally process a second voice input represented in the second portion of the at least one input data stream, wherein the second voice input comprises a first command keyword and at least one keyword from a set of keywords supported by a local voice assistant, wherein the first command keyword is one of a plurality of command keywords supported by the local voice assistant of the playback device, and wherein the second voice input excludes the wake word; determine, via the local voice assistant, (i) a particular command corresponding to the first command keyword and (ii) one or parameters corresponding to the at least one keyword, the one or more parameters modifying the particular command; and cause at least one local network device to carry out the particular command according to the one or more parameters. 2. The playback device of claim 1 , wherein the instructions are executable by the at least one processor such that the playback device is further configured to: detect a second command keyword in a third portion of the at least one input data stream; based on detection of the second command keyword, trigger a second command keyword event to locally process a third voice input represented in the second portion of the at least one input data stream, wherein the third voice input comprises the second command keyword, and wherein the second command keyword is one of the plurality of command keywords supported by the local voice assistant of the playback device; determine that the local voice assistant is unable to process a particular command corresponding to the second command keyword; and after the determination that the local voice assistant is unable to process the particular command corresponding to the second command keyword, stream, via the network interface, sound data representing at least a portion of the third voice input to the one or more remote servers of the voice assistant service for remove processing of the third voice input via the voice assistant of the one or more remote servers. 3. The playback device of claim 2 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to determine that the local voice assistant is unable to process the particular command corresponding to the second command keyword comprise instructions that are executable by the at least one processor such that the playback device is configured to: determine that a confidence score produced by the local voice assistant in processing the third voice input is below a threshold. 4. The playback device of claim 1 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to determine the one or parameters corresponding to the at least one keyword comprise instructions that are executable by the at least one processor such that the playback device is configured to: determine that the at least one keyword of the first voice input includes one or more particular keywords representing a room name; and determine that the room name corresponds to a particular room including the at least one local network device; and assign the particular room to a target parameter for the particular command. 5. The playback device of claim 4 , wherein the instructions are executable by the at least one processor such that the playback device is further configured to: populate the set of keywords supported by the local voice assistant with keywords corresponding to respective room names of rooms configured according to one or more smart home protocols. 6. The playback device of claim 1 , wherein the playback device is connected to a local area network, and wherein the instructions are executable by the at least one processor such that the playback device is further configured to: discover, via the network interface, local network devices connected to the local area network; and populate the set of keywords supported by the local voice assistant with keywords corresponding to respective names of the discovered local network devices. 7. The playback device of claim 1 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to detect the first command keyword event comprise instructions that are executable by the at least one processor such that the playback device is configured to: determine that one or more conditions corresponding to the first command keyword are satisfied. 8. The playback device of claim 7 , wherein the one or more conditions corresponding to the first command keyword comprise a particular condition representing an absence of background speech, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to determine that the one or more conditions corresponding to the first command keyword are satisfied comprise instructions that are executable by the at least one processor such that the playback device is configured to: determine an absence of background speech in sound detected by the at least one microphone during capture of the second voice input. 9. The playback device of claim 1 , wherein the at least one local network device comprises an additional playback device, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to cause the at least one local network device to carry out the particular command according to the one or more parameters comprise instructions that are executable by the at least one processor such that the playback device is configured to: cause, via the network interface, the additional playback device to play back audio content according to the particular command. 10. The playback device of claim 1 , wherein the at least one local network device comprises a smart illumination device, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to cause the at least one local network device to carry out the particular command according to the one or more parameters comprise instructions that are executable by the at least one processor such that the playback device is configur

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854547B2 cover?
In one aspect, a playback device includes a voice assistant service (VAS) wake-word engine and a command keyword engine. The playback device detects, via the command keyword engine, a first command keyword of in voice input of sound detected by one or more microphones of the playback device. The playback device determines an intent based on at least one keyword in the voice input via a local na…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).