Offline voice control

US11869503B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11869503-B2
Application numberUS-202117548921-A
CountryUS
Kind codeB2
Filing dateDec 13, 2021
Priority dateDec 20, 2019
Publication dateJan 9, 2024
Grant dateJan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the cloud-based voice assistant service. Other techniques relate to interactions between local and cloud-based processing of voice inputs on a device that supports both local and cloud-based processing.

First claim

Opening claim text (preview).

The invention claimed is: 1. A playback device comprising: at least one audio transducer; one or more microphones; a network interface; at least one processor; a housing carrying the one or more microphones, the network interface, the at least one processor, and data storage including instructions that are executable by the at least one processor such that the playback device is configured to: while the playback device is in an offline mode: monitor, via a local voice assistant, a sound data stream from the one or more microphones for local keywords from a local natural language unit library of the local voice assistant, wherein in the offline mode, a voice assistant service (VAS) wake-word engine is inactive; generate a local wake-word event corresponding to a first voice input when the local voice assistant detects sound data matching one or more local keywords in a first portion of the sound data stream, wherein the one or more local keywords comprise a local wake word; determine, via the local voice assistant, an intent of the first voice input; select, from among a plurality of pre-determined responses, a pre-determined response corresponding to the determined intent of the first voice input; carry out the selected pre-determined response; and while the playback device is in an online mode: monitor, via the VAS wake-word engine, the sound data stream from the one or more microphones for one or more VAS wake words of a cloud-based voice assistant service, wherein in the online mode, the VAS wake-word engine is active; and generate a VAS wake-word event corresponding to a second voice input when the VAS wake-word engine detects sound data matching a particular VAS wake word in a second portion of the sound data stream, wherein, when the VAS wake word event is generated, the playback device streams sound data representing the second voice input to one or more servers of the cloud-based voice assistant service. 2. The playback device of claim 1 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to select the pre-determined response corresponding to the determined intent of the first voice input comprise instructions that are executable by the at least one processor such that the playback device is configured to select one or more pre-determined audible responses from a plurality of pre-determined audible responses corresponding to respective intents, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to carry out the selected pre-determined response comprise instructions that are executable by the at least one processor such that the playback device is configured to play back the selected one or more pre-determined audible responses via the at least one audio transducer. 3. The playback device of claim 2 , wherein the determined intent represents a command to configure a voice assistant service on the playback device, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to play back the selected one or more pre-determined audible responses via the at least one audio transducer comprise instructions that are executable by the at least one processor such that the playback device is configured to play back one or more audible prompts to configure the VAS wake-word engine for the cloud-based VAS. 4. The playback device of claim 1 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to select the pre-determined response corresponding to the determined intent of the first voice input comprise instructions that are executable by the at least one processor such that the playback device is configured to select a particular Internet-Of-Things (IoT) device from among a plurality of IoT devices that are connected to a local area network, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to carry out the selected pre-determined response comprise instructions that are executable by the at least one processor such that the playback device is configured to send, via the network interface, instructions to toggle a state of the particular IoT device. 5. The playback device of claim 4 , wherein the IoT device comprises a smart illumination device, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to send the instructions to toggle the state of the particular IoT device comprise instructions that are executable by the at least one processor such that the playback device is configured to send, via the network interface, instructions to toggle an illumination state of the smart illumination device. 6. The playback device of claim 1 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to select the pre-determined response corresponding to the determined intent of the first voice input comprise instructions that are executable by the at least one processor such that the playback device is configured to select a particular playlist from among a plurality of playlists, and wherein the instructions that are executable by the at least one processor such that the playback device is configured to carry out the selected pre-determined response comprise instructions that are executable by the at least one processor such that the playback device is configured to play back the playlist via the at least one audio transducer. 7. The playback device of claim 6 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to select the particular playlist from among the plurality of playlists comprise instructions that are executable by the at least one processor such that the playback device is configured to: select the particular playlist based on the one or more local keywords corresponding to metadata associated with the particular playlist. 8. The playback device of claim 1 , wherein the instructions are executable by the at least one processor such that the playback device is further configured to: while in the online mode, monitor, via the local voice assistant, the sound data stream from the one or more microphones for local keywords from the local natural language unit library of the local voice assistant concurrently with monitoring, via the VAS wake-word engine, the sound data stream for one or more VAS wake words; generate an additional local wake-word event corresponding to a third voice input when the local voice assistant detects sound data matching one or more additional local keywords in a third portion of the sound data stream; determine, via the local voice assistant, an intent of the third voice input; select, from among a plurality of pre-determined responses, an additional pre determined response, the additional pre-determined response corresponding to the determined intent of the third voice input; and carry out the selected additional pre-determined response. 9. The playback device of claim 8 , wherein the instructions that are executable by the at least one processor such that the playback device is configured to determine the intent of the third voice input comprise instructions that are executable by the at least one processor such that the playback device is configured to determine that the intent of the third voice input corresponds to a command to disable the VAS wake-word engine, and wherein the instructions that are executable by the at least one processor such that the playback device i

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • to the speaker · CPC title

  • Speech classification or search · CPC title

  • by checking connectivity · CPC title

  • Word spotting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11869503B2 cover?
As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).