Intelligent digital assistant system

US10984782B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10984782-B2
Application numberUS-201715640251-A
CountryUS
Kind codeB2
Filing dateJun 30, 2017
Priority dateFeb 14, 2017
Publication dateApr 20, 2021
Grant dateApr 20, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To address the issues of handling conversations with multiple users, an intelligent digital assistant system is provided. The system may include at least one microphone configured to receive an audio input, a speaker configured to emit an audio output, and a processor. The processor may be configured engage in a conversation with a first user, and, concurrent with the first user being engaged in the conversation with the system, recognize speech of one or more additional users in the audio input. The processor may process the recognized speech of the one or more additional users to determine a context for each additional user, and execute a conversation disentanglement module to select and perform one or more predetermined conversation disentanglement actions according to the context of the recognized speech of each additional user.

First claim

Opening claim text (preview).

The invention claimed is: 1. An intelligent digital assistant system comprising: a first device including a first microphone configured to receive an audio input and a first audio speaker configured to emit an audio output; and a processor configured to: recognize speech of a first user in the audio input received via the first microphone; engage in a first conversation with the first user via the first microphone and the first audio speaker, based on the recognized speech of the first user; activate a first intent template with a plurality of slots responsive to the first conversation with the first user, the first intent template selected from a set of available intent templates based on context derived from the recognized speech of the first user; determine that one or more slots of the plurality of slots of the first intent template are missing information; concurrent with the first conversation, recognize speech of a second user in the audio input received via the first microphone; determine whether the recognized speech of the second user is directed to the system or is instead a sidebar conversation with the first user, based on the recognized speech of the second user; if the recognized speech of the second user is determined to be directed to the system, perform a first predetermined conversation disentanglement action that includes: identifying presence of a second device connected to the system positioned remotely from the first device for the second user to continue interacting with the system, outputting an audio instruction via the first audio speaker of the first device that instructs the second user to engage in a second conversation with the system via the second device, engaging in the second conversation with the second user via a second microphone and a second audio speaker of the second device, based on the recognized speech of the second user, activating a second intent template responsive to the speech of the second user, the second intent template selected from the set of available intent templates based on context derived from the recognized speech of the second user, and filling one or more slots of the second intent template according to a context derived from the second conversation; and if the recognized speech of the second user is determined to be a sidebar conversation with the first user, perform a second predetermined conversation disentanglement action that includes filling the one or more slots of the first intent template that are missing information according to a context derived from the sidebar conversation, including context derived from the speech of the second user. 2. The intelligent digital assistant system of claim 1 , wherein, in engaging in the first conversation with the first user, the processor: recognizes speech of the first user in the audio input; performs speaker-aware speech-to-text conversion on the speech of the first user in the audio input to thereby output speaker-specific text for the first user; and determines that the first user is talking to the intelligent digital assistant system, based on the speaker-specific text for the user. 3. The intelligent digital assistant system of claim 2 , wherein, the first user is one of a plurality of users who are speaking in a vicinity of the first microphone; and the processor determines that the first user is talking to the intelligent digital assistant system by performing speaker diarization on the audio input and determining that it is the first user who activated the system. 4. The intelligent digital assistant system of claim 1 , wherein the second disentanglement action includes ignoring the recognized speech of the second user when the recognized speech of the second user is determined to be unrelated to resolving the first conversation with the first user. 5. The intelligent digital assistant system of claim 1 , wherein the first disentanglement action includes outputting an audio instruction that instructs the second user to stop speaking and wait until being prompted to resume speaking. 6. The intelligent digital assistant system of claim 1 , wherein the second disentanglement action includes: storing at least a portion of the recognized speech of the second user as additional context for the first conversation, processing the context obtained from the first user and the additional context obtained from the second user to determine an intent for the first conversation with the first user, and providing an output based on the intent. 7. The intelligent digital assistant system of claim 6 , wherein the processor is further configured to: perform a search based on the stored context and intent of the sidebar conversation, and output a result of the search. 8. A method for an intelligent digital assistant system, the method comprising: receiving an audio input through a first microphone included in a first device; recognizing speech of a first user in the audio input received via the first microphone; engaging in a first conversation with the first user via the first microphone and a first audio speaker included in the first device, based on the recognized speech of the first user; activating a first intent template with a plurality of slots responsive to the first conversation with the first user, the first intent template selected from a set of available intent templates based on context derived from the recognized speech of the first user; determining that one or more slots of the plurality of slots of the first intent template are missing information; concurrent with the first conversation, recognizing speech of a second user in the audio input received via the first microphone; determining whether the recognized speech of the second user is directed to the system or is instead a sidebar conversation with the first user, based on the recognized speech of the second user; if the recognized speech of the second user is determined to be directed to the system, performing a first predetermined conversation disentanglement action that includes: identifying presence of a second device connected to the system positioned remotely from the first device for the second user to continue interacting with the system, outputting an audio instruction via the first audio speaker of the first device that instructs the second user to engage in a second conversation with the system via the second device, engaging in the second conversation with the second user via a second microphone and a second audio speaker of the second device, based on the recognized speech of the second user, activating a second intent template responsive to the speech of the second user, the second intent template selected from the set of available intent templates based on context derived from the recognized speech of the second user, and filling one or more slots of the second intent template according to a context derived from the second conversation; and if the recognized speech of the second user is determined to be a sidebar conversation with the first user, performing a second predetermined conversation disentanglement action that includes filling the one or more slots of the first intent template that are missing information according to a context derived from the sidebar conversation, including context derived from the speech of the second user. 9. The method for an intelligent digital assistant system according to claim 8 , the method further comprising: recognizing speech of the first user in the audio input; performing speaker-aware speech-to-text conversion on the speech of the first user in the audio input to thereby output speaker-specific text for the first user; and determining that

Assignees

Inventors

Classifications

  • of input or preprocessed data · CPC title

  • Interactive pattern learning with a human teacher · CPC title

  • Graphical models, e.g. Bayesian networks · CPC title

  • where the recognised objects include parts of the human body · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10984782B2 cover?
To address the issues of handling conversations with multiple users, an intelligent digital assistant system is provided. The system may include at least one microphone configured to receive an audio input, a speaker configured to emit an audio output, and a processor. The processor may be configured engage in a conversation with a first user, and, concurrent with the first user being engaged i…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/1822. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 20 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).