Voice assistant persistence across multiple network microphone devices

US12518756B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12518756-B2
Application numberUS-202318461430-A
CountryUS
Kind codeB2
Filing dateSep 5, 2023
Priority dateMay 3, 2019
Publication dateJan 6, 2026
Grant dateJan 6, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for maintaining voice assistant persistence across multiple network microphone devices are described. In one example, first and second NMDs each identify a wake word based on detected sound, and are each transitioned from an inactive state to an active state in which the NMD captures and transmits sound data over a network interface. The first NMD is selected over the second NMD to output a first response, and both NMDs remain in the active state to further capture and transmit sound data. After further capturing and transmitting of sound data, the second NMD is selected over the first NMD to output a second response. After a predetermined time, one or both of the NMDs are transitioned back to the inactive state. The selection of one NMD over another for outputting a response can be based at least in part on user location information.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A media playback system comprising: a first network microphone device (NMD) comprising a first one or more microphones; a second NMD comprising a second one or more microphones; one or more processors; and at least one tangible, non-transitory, computer-readable medium storing instructions executable by the one or more processors to cause the media playback system to perform operations comprising: based on a voice input, capturing (i) first sound data via the first one or more microphones and (ii) second sound data via the second one or more microphones; identifying (i) a wake word based on the first sound data as captured via the first one or more microphones and (ii) the wake word based on the second sound data as captured via the second one or more microphones, wherein the wake word is associated with a voice assistant service (VAS); coordinating state transitions of the first NMD and the second NMD, wherein coordinating state transition comprises: after identifying the wake word via the first one or more microphones, and while the first NMD is in an active state, capturing additional first sound data via the first NMD and transmitting the first sound data over a network interface, wherein the first NMD transitions from an inactive state to the active state upon identifying the wake word; after identifying the wake word via the second one or more microphones, and while the second NMD is in an active state, capturing additional second sound data via the second NMD and transmitting the second sound data over the network interface, wherein the second NMD transitions from an inactive state to the active state upon identifying the wake word; after transmitting the first sound data, (i) receiving, via the media playback system, a first response of a multi-turn conversation with the VAS and (ii) receiving, via the media playback system, a first selection of the first NMD over the second NMD to output the first response, wherein the first selection is based at least in part on first user location information; maintaining persistence of a VAS interaction across multiple NMDs by: outputting the first response via only the first NMD, after outputting the first response via the first NMD, and without detecting another instance of a wake word, (i) receiving, via the media playback system, a second response of the multi-turn conversation with the VAS, and (ii) receiving, via the media playback system, a second selection of the second NMD over the first NMD to output the second response, wherein the second selection is based at least in part on second user location information; and outputting the second response via the second NMD, wherein, after selecting the first NMD and after selecting the second NMD, each of the first NMD and the second NMD remains in the active state for further capturing and transmitting of sound data without detecting another instance of a wake word. 2 . The media playback system of claim 1 , the operations further comprising, after identifying the wake word based on the first sound data as captured via the first one or more microphones and after identifying the wake word based on the second sound data as captured via the second one or more microphones, transitioning a third NMD from an inactive state to an active state in which the third NMD captures and transmits over a network interface third sound data corresponding to the voice input as detected by the third NMD, wherein the third NMD did not identify the wake word based on the voice input. 3 . The media playback system of claim 2 , wherein the third NMD is in the vicinity of a user of the media playback system. 4 . The media playback system of claim 2 , wherein the third NMD is in the vicinity of at least one of the first NMD or the second NMD. 5 . The media playback system of claim 1 , the operations further comprising forwarding the second response from the first NMD to the second NMD over a local area network. 6 . The media playback system of claim 1 , the operations further comprising: after outputting the second response: receiving, via the first NMD, a third response; and forwarding the third response from the first NMD to the second NMD over a local area network. 7 . The media playback system of claim 1 , wherein the first user location information is based on a signal strength from a wireless proximity beacon. 8 . The media playback system of claim 1 , wherein the first user location information is based on reflected acoustic signal received via at least one of the first NMD or the second NMD. 9 . A method performed by a media playback system comprising a first network microphone device (NMD) and a second NMD, the method comprising: based on a voice input, capturing (a) first sound data via one or more microphones of the first NMD and (b) second sound data via one or more microphones of the second NMD; identifying, via the first NMD, a wake word based on the first sound data as captured via the first NMD, wherein the wake word is associated with a voice assistant service (VAS); identifying, via the second NMD, the wake word based on the second sound data as captured via the second NMD; coordinating state transitions of the first NMD and the second NMD, wherein coordinating state transitions comprises: after identifying the wake word via the first NMD, and while the first NMD is in an active state, capturing additional first sound data via the first NMD and transmitting the first sound data over a network interface, wherein the first NMD transitions from an inactive state to the active state upon identifying the wake word; after identifying the wake word via the second NMD, and while the second NMD is in an active state, capturing additional second sound data via the second NMD and transmitting the second sound data over a network interface, wherein the second NMD transitions from an inactive state to the active state upon identifying the wake word; after transmitting the first and second sound data captured by the respective first and second NMDs, (i) receiving, via the media playback system, a first response of a multi-turn conversation with the VAS and (ii) receiving, via the media playback system, a first selection of the first NMD over the second NMD to output the first response, wherein the first selection is based at least in part on first user location information; maintaining persistence of a VAS interaction across multiple NMDs by: outputting the first response via the first NMD; after outputting the first response via the first NMD, and without detecting another instance of a wake word, (i) receiving, via the media playback system, a second response of the multi-turn conversation with the VAS, and (ii) receiving, via the media playback system, a second selection of the second NMD over the first NMD to output the second response, wherein the second selection is based at least in part on second user location information; and outputting the second response via the second NMD, wherein, after selecting the first NMD and after selecting the second NMD, each of the first NMD and the second NMD remains in the active state for further capturing and transmitting of sound data without detecting another instance of as wake word. 10 . The method of claim 9 , further comprising, after identifying the wake word via the first NMD and after identifying the wake word via the second NMD, transitioning a third NMD from an inactive state to an active state in which the third NMD captures and transmits over a network interface third sound data corresponding to voice input as detected by the third NMD, wherein the third NMD did not identify the wake word based on the voice in

Assignees

Inventors

Classifications

  • for microphones (H04R1/34 and H04R1/40 take precedence) · CPC title

  • Execution procedure of a spoken command · CPC title

  • Word spotting · CPC title

  • of the speaker; Human-factor methodology · CPC title

  • Speech classification or search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12518756B2 cover?
Systems and methods for maintaining voice assistant persistence across multiple network microphone devices are described. In one example, first and second NMDs each identify a wake word based on detected sound, and are each transitioned from an inactive state to an active state in which the NMD captures and transmits sound data over a network interface. The first NMD is selected over the second…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).