Controller for audio device and associated operation method
US-2015063580-A1 · Mar 5, 2015 · US
US12518756B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12518756-B2 |
| Application number | US-202318461430-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 5, 2023 |
| Priority date | May 3, 2019 |
| Publication date | Jan 6, 2026 |
| Grant date | Jan 6, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for maintaining voice assistant persistence across multiple network microphone devices are described. In one example, first and second NMDs each identify a wake word based on detected sound, and are each transitioned from an inactive state to an active state in which the NMD captures and transmits sound data over a network interface. The first NMD is selected over the second NMD to output a first response, and both NMDs remain in the active state to further capture and transmit sound data. After further capturing and transmitting of sound data, the second NMD is selected over the first NMD to output a second response. After a predetermined time, one or both of the NMDs are transitioned back to the inactive state. The selection of one NMD over another for outputting a response can be based at least in part on user location information.
Opening claim text (preview).
The invention claimed is: 1 . A media playback system comprising: a first network microphone device (NMD) comprising a first one or more microphones; a second NMD comprising a second one or more microphones; one or more processors; and at least one tangible, non-transitory, computer-readable medium storing instructions executable by the one or more processors to cause the media playback system to perform operations comprising: based on a voice input, capturing (i) first sound data via the first one or more microphones and (ii) second sound data via the second one or more microphones; identifying (i) a wake word based on the first sound data as captured via the first one or more microphones and (ii) the wake word based on the second sound data as captured via the second one or more microphones, wherein the wake word is associated with a voice assistant service (VAS); coordinating state transitions of the first NMD and the second NMD, wherein coordinating state transition comprises: after identifying the wake word via the first one or more microphones, and while the first NMD is in an active state, capturing additional first sound data via the first NMD and transmitting the first sound data over a network interface, wherein the first NMD transitions from an inactive state to the active state upon identifying the wake word; after identifying the wake word via the second one or more microphones, and while the second NMD is in an active state, capturing additional second sound data via the second NMD and transmitting the second sound data over the network interface, wherein the second NMD transitions from an inactive state to the active state upon identifying the wake word; after transmitting the first sound data, (i) receiving, via the media playback system, a first response of a multi-turn conversation with the VAS and (ii) receiving, via the media playback system, a first selection of the first NMD over the second NMD to output the first response, wherein the first selection is based at least in part on first user location information; maintaining persistence of a VAS interaction across multiple NMDs by: outputting the first response via only the first NMD, after outputting the first response via the first NMD, and without detecting another instance of a wake word, (i) receiving, via the media playback system, a second response of the multi-turn conversation with the VAS, and (ii) receiving, via the media playback system, a second selection of the second NMD over the first NMD to output the second response, wherein the second selection is based at least in part on second user location information; and outputting the second response via the second NMD, wherein, after selecting the first NMD and after selecting the second NMD, each of the first NMD and the second NMD remains in the active state for further capturing and transmitting of sound data without detecting another instance of a wake word. 2 . The media playback system of claim 1 , the operations further comprising, after identifying the wake word based on the first sound data as captured via the first one or more microphones and after identifying the wake word based on the second sound data as captured via the second one or more microphones, transitioning a third NMD from an inactive state to an active state in which the third NMD captures and transmits over a network interface third sound data corresponding to the voice input as detected by the third NMD, wherein the third NMD did not identify the wake word based on the voice input. 3 . The media playback system of claim 2 , wherein the third NMD is in the vicinity of a user of the media playback system. 4 . The media playback system of claim 2 , wherein the third NMD is in the vicinity of at least one of the first NMD or the second NMD. 5 . The media playback system of claim 1 , the operations further comprising forwarding the second response from the first NMD to the second NMD over a local area network. 6 . The media playback system of claim 1 , the operations further comprising: after outputting the second response: receiving, via the first NMD, a third response; and forwarding the third response from the first NMD to the second NMD over a local area network. 7 . The media playback system of claim 1 , wherein the first user location information is based on a signal strength from a wireless proximity beacon. 8 . The media playback system of claim 1 , wherein the first user location information is based on reflected acoustic signal received via at least one of the first NMD or the second NMD. 9 . A method performed by a media playback system comprising a first network microphone device (NMD) and a second NMD, the method comprising: based on a voice input, capturing (a) first sound data via one or more microphones of the first NMD and (b) second sound data via one or more microphones of the second NMD; identifying, via the first NMD, a wake word based on the first sound data as captured via the first NMD, wherein the wake word is associated with a voice assistant service (VAS); identifying, via the second NMD, the wake word based on the second sound data as captured via the second NMD; coordinating state transitions of the first NMD and the second NMD, wherein coordinating state transitions comprises: after identifying the wake word via the first NMD, and while the first NMD is in an active state, capturing additional first sound data via the first NMD and transmitting the first sound data over a network interface, wherein the first NMD transitions from an inactive state to the active state upon identifying the wake word; after identifying the wake word via the second NMD, and while the second NMD is in an active state, capturing additional second sound data via the second NMD and transmitting the second sound data over a network interface, wherein the second NMD transitions from an inactive state to the active state upon identifying the wake word; after transmitting the first and second sound data captured by the respective first and second NMDs, (i) receiving, via the media playback system, a first response of a multi-turn conversation with the VAS and (ii) receiving, via the media playback system, a first selection of the first NMD over the second NMD to output the first response, wherein the first selection is based at least in part on first user location information; maintaining persistence of a VAS interaction across multiple NMDs by: outputting the first response via the first NMD; after outputting the first response via the first NMD, and without detecting another instance of a wake word, (i) receiving, via the media playback system, a second response of the multi-turn conversation with the VAS, and (ii) receiving, via the media playback system, a second selection of the second NMD over the first NMD to output the second response, wherein the second selection is based at least in part on second user location information; and outputting the second response via the second NMD, wherein, after selecting the first NMD and after selecting the second NMD, each of the first NMD and the second NMD remains in the active state for further capturing and transmitting of sound data without detecting another instance of as wake word. 10 . The method of claim 9 , further comprising, after identifying the wake word via the first NMD and after identifying the wake word via the second NMD, transitioning a third NMD from an inactive state to an active state in which the third NMD captures and transmits over a network interface third sound data corresponding to voice input as detected by the third NMD, wherein the third NMD did not identify the wake word based on the voice in
for microphones (H04R1/34 and H04R1/40 take precedence) · CPC title
Execution procedure of a spoken command · CPC title
Word spotting · CPC title
of the speaker; Human-factor methodology · CPC title
Speech classification or search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.