System and Method for Issuing Commands in a Media Playback System
US-2015091709-A1 · Apr 2, 2015 · US
US11200900B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11200900-B2 |
| Application number | US-201916723909-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2019 |
| Priority date | Dec 20, 2019 |
| Publication date | Dec 14, 2021 |
| Grant date | Dec 14, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the cloud-based voice assistant service. Other techniques relate to interactions between local and cloud-based processing of voice inputs on a device that supports both local and cloud-based processing.
Opening claim text (preview).
I claim: 1. A playback device comprising: a network interface; one or more microphones; at least one speaker; one or more processors; data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions comprising: while a local voice input pipeline is in a set-up mode, monitoring, via the local voice input pipeline, a sound data stream from the one or more microphones for local keywords from a local natural language unit library of the local voice input pipeline; generating a local wake-word event corresponding to a first voice input when the local voice input pipeline detects sound data matching one or more particular local keywords in a first portion of the sound data stream; determining, via a local natural language unit of the local voice input pipeline, an intent based on the one or more particular local keywords of the first voice input, the determined intent representing a command to configure a voice assistant service on the playback device; based on the determined intent, outputting, via the at least one speaker, one or more audible prompts to configure a VAS wake-word engine for one or more voice assistant services; after the VAS wake-word engine is configured for a particular voice assistant service, monitoring, via the VAS wake-word engine, the sound data stream from the one or more microphones for one or more VAS wake words of the particular voice assistant service; generating a VAS wake-word event corresponding to a second voice input when the VAS wake-word engine detects sound data matching a particular VAS wake word in a second portion of the sound data stream, wherein, when the VAS wake word event is generated, the playback device streams sound data representing the second voice input to one or more servers of the particular voice assistant service; detecting a failure by the particular voice assistant service to provide a response to the second voice input; based on detecting the failure, outputting, via the at least one speaker, an audible troubleshooting prompt indicating at least one of: (a) one or more issues causing the failure or (b) one or more troubleshooting actions to correct the one or more issues causing the failure; after playing back the audible troubleshooting prompt, monitoring, via the local voice input pipeline, the sound data stream from the one or more microphones for a voice input response to the audible troubleshooting prompt; determining, via the local natural language unit, an intent of the voice input response to the audible troubleshooting prompt; and performing one or more operations according to the determined intent of the voice input response to the audible troubleshooting prompt. 2. The playback device of claim 1 , wherein the one or more issues causing the failure comprise an Internet connection issue, and wherein the functions further comprise: performing one or more Internet connection tests; while performing the one or more Internet connection tests, detecting an Internet connection failure, wherein detecting the Internet connection failure comprises (a) determining that the playback device is disconnected from the Internet or (b) determining (i) that playback device is connected to the Internet and (ii) the one or more servers of the particular VAS are inaccessible over the Internet from the playback device; and based on detecting an Internet connection failure, playing back (i) an audible prompt indicating the detected Internet connection failure and (ii) a series of audible prompts to perform one or more Internet connection troubleshooting actions corresponding to the detected Internet connection failure. 3. The playback device of claim 1 , wherein outputting the one or more audible prompts to configure a VAS wake-word engine for one or more voice assistant services comprises outputting an audible prompt to configure a VAS wake-word engine for one or more voice assistant services via a control application on a mobile device. 4. The playback device of claim 1 , wherein outputting the one or more audible prompts to configure a VAS wake-word engine for one or more voice assistant services comprises outputting a series of audible prompts to (i) select the particular voice assistant service from among a plurality of voice assistant services supported by the playback device and (ii) provide user account information to register the playback device with the particular voice assistant service. 5. The playback device of claim 1 , wherein monitoring the first sound data stream for local keywords from the local natural language unit library comprises monitoring the first sound data stream for a first set of keywords from the local natural language unit library, and wherein the functions further comprise: receiving data representing instructions to configure the local voice input pipeline into an operating mode; and based on receiving the data representing instructions to configure the local voice input pipeline into the operating mode, switching the local voice input pipeline from the set-up mode to an operating mode, wherein in the operating mode, the local voice input pipeline monitors the sound data stream for a second set of keywords from the local natural language unit library, wherein the second set comprises additional keywords relative to the first set. 6. The playback device of claim 5 , wherein the functions further comprise: while the local voice input pipeline is in the operating mode, monitoring, via the VAS wake-word engine, the sound data stream from the one or more microphones for one or more VAS wake words of the particular voice assistant service; generating a VAS wake-word event corresponding to a third voice input when the VAS wake-word engine detects sound data matching a particular VAS wake word in a third portion of the sound data stream, wherein, when the VAS wake word event is generated, the playback device streams sound data representing the third voice input to one or more servers of the particular voice assistant service; detecting a failure by the particular voice assistant service to provide a response to the third voice input; based on detecting the failure by the particular voice assistant service to provide a response to the third voice input, determining, via the local voice input pipeline, an intent of the third voice input; and outputting, via the at least one speaker, a response to the third voice input based on the determined intent. 7. The playback device of claim 1 , wherein the functions further comprise: receiving input data representing a command to disable the VAS wake-word engine; disabling the VAS wake-word engine in response to receiving the input data representing the command to disable the VAS wake-word engine wherein disabling the VAS wake word engine comprises physically disconnecting the VAS wake word engine from one or more of: (a) the one or more microphones, (b) the network interface, or (c) power; while the VAS wake-word engine is disabled, monitoring, via the local voice input pipeline, the sound data stream from the one or more microphones for (a) the one or more VAS wake words and (b) local keywords; and when the local voice input pipeline detects sound data matching a given VAS wake word in a given portion of the sound data stream, outputting, via the at least one speaker, an audible prompt indicating that the VAS wake-word engine is disabled. 8. The playback device of claim 7 , wherein the functions further comprise: generating a local wake-word event corresponding to a fourth voice input when the local voice input pipeline detects sound data matching the given VAS wake word in a fourth portion of the sound data stream; determining, via the
Word spotting · CPC title
to the speaker · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.