Intent inference in audiovisual communication sessions

US12283269B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12283269-B2
Application numberUS-202117450925-A
CountryUS
Kind codeB2
Filing dateOct 14, 2021
Priority dateOct 16, 2020
Publication dateApr 22, 2025
Grant dateApr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one aspect, a user's intent can be inferred based on voice analysis during a communications session, and prompts can be presented, or other actions taken, at least partly in response to the inferred intent. For example, a network microphone device (NMD) having one or more microphones can capture voice input and transmit the voice input to remote computing device(s) for a communication session (e.g., a videoconference). The NMD can analyze the voice input to detect one or more utterances. Based on the utterance(s), the NMD can cause a user prompt to be displayed via a display device communicatively coupled to the NMD. The particular prompt can depend at least in part on one or more context parameters associated with the communication session (e.g., a microphone state of one or more users, a screen share state of one or more users, or a recording status of the session, etc.).

First claim

Opening claim text (preview).

The invention claimed is: 1. A network microphone device comprising: one or more microphones; a network interface; one or more processors; data storage having instructions stored therein that, when executed by the one or more processors, cause the network microphone device to perform operations comprising: capturing voice input from a first user via the one or more microphones during an ongoing communication session involving at least the first user, a second user, and a third user; transmitting the voice input to one or more remote computing devices for the communication session; analyzing the voice input to detect one or more utterances from the first user; monitoring a context parameter of the communication session, based on the one or more utterances detected during the ongoing communication session, inferring an intent of the first user; and based on the inferred intent of the first user and the context parameter, causing a user prompt to be displayed via a first display device communicatively coupled to the network microphone device, the first display device associated with the second user, wherein the user prompt is not displayed via a second display device communicatively coupled to the network microphone device, the second display device associated with the third user. 2. The network microphone device of claim 1 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input via a local natural language processing unit configured to detect keywords in the voice input. 3. The network microphone device of claim 1 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input locally via the network microphone device, and wherein causing the user prompt to be displayed via the first display device comprises transmitting a control signal based on results of the local analysis to one or more remote computing devices which cause the user prompt to be displayed via the first display device. 4. The network microphone device of claim 1 , wherein the user prompt comprises one or more of: a prompt to mute or unmute the second user's microphone; a prompt to share or un-share the second user's screen; or a prompt to enable or disable the second user's camera. 5. The network microphone device of claim 1 , wherein the context parameter comprises one or more of: a microphone state of one or more users participating in the communications session; a screen share state of one or more users participating in the communications session; or a recording status of the communications session. 6. The network microphone device of claim 1 , wherein the user prompt comprises a visual interface offering the second user an option to perform an action. 7. A method, comprising: capturing voice input from a first user via one or more microphones of a network microphone device during an ongoing communication session involving at least the first user, a second user, and a third user; transmitting the voice input to one or more remote computing devices for the communication session; analyzing the voice input to detect one or more utterances from the first user; monitoring a context parameter of the communication session; based on the one or more utterances detected during the ongoing communication session, inferring an intent of the first user; and based on the inferred intent of the first user and the context parameter, causing a user prompt to be displayed via a first display device communicatively coupled to the network microphone device, the first display device associated with the second user wherein the user prompt is not displayed via a second display device communicatively coupled to the network microphone device, the second display device with the third user. 8. The method of claim 7 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input via a local natural language processing unit configured to detect keywords in the voice input. 9. The method of claim 7 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input locally via the network microphone device, and wherein causing the user prompt to be displayed via the first display device comprises transmitting a control signal based on results of the local analysis to one or more remote computing devices which cause the user prompt to be displayed via the first display device. 10. The method of claim 7 , wherein the user prompt comprises one or more of: a prompt to mute or unmute the second user's microphone; a prompt to share or un-share the second user's screen; or a prompt to enable or disable the second user's camera. 11. The method of claim 7 , wherein the context parameter comprises one or more of: a microphone state of one or more users participating in the communications session; a screen share state of one or more users participating in the communications session; or a recording status of the communications session. 12. The method of claim 7 , wherein the user prompt comprises a visual interface offering the second user an option to perform an action. 13. A tangible, non-transitory computer-readable medium storing instructions that, when executed by one or more processors of a network microphone device, cause the network microphone device to perform operations comprising: capturing voice input via from a first user one or more microphones of the network microphone device during an ongoing communication session involving at least the first user, a second user, and a third user; transmitting the voice input to one or more remote computing devices for the communication session; analyzing the voice input to detect one or more utterances from the first user; monitoring a context parameter of the communication session; and based on the one or more utterances, inferring an intent of the first user, and based on the inferred intent of the first user and the context parameter, causing a user prompt to be displayed via a first display device communicatively coupled to the network microphone device, the first display device associated with the second user, wherein the user prompt is not displayed via a second display device communicatively coupled to the network microphone device, the second display device associated with the third user. 14. The computer-readable medium of claim 13 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input via a local natural language processing unit configured to detect keywords in the voice input. 15. The computer-readable medium of claim 13 , wherein analyzing the voice input to detect one or more utterances comprises analyzing the voice input locally via the network microphone device, and wherein causing the user prompt to be displayed via the first display device comprises transmitting a control signal based on results of the local analysis to one or more remote computing devices which cause the user prompt to be displayed via the first display device. 16. The computer-readable medium of claim 13 , wherein the user prompt comprises one or more of: a prompt to mute or unmute the second user's microphone; a prompt to share or un-share the second user's screen; or a prompt to enable or disable the second user's camera. 17. The computer-readable medium of claim 13 , wherein the user prompt comprises a visual interface offering the second user an option to perform an action. 18. The computer-readable medium of claim 13 , wher

Assignees

Inventors

Classifications

  • Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status · CPC title

  • to the speaker · CPC title

  • Conference organisation arrangements, e.g. handling schedules, setting up parameters needed by nodes to attend a conference, booking network resources, notifying involved parties · CPC title

  • Execution procedure of a spoken command · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12283269B2 cover?
In one aspect, a user's intent can be inferred based on voice analysis during a communications session, and prompts can be presented, or other actions taken, at least partly in response to the inferred intent. For example, a network microphone device (NMD) having one or more microphones can capture voice input and transmit the voice input to remote computing device(s) for a communication sessio…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/05. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).