Multi-microphone speech dialog system for multiple spatial zones

US11367437B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11367437-B2
Application numberUS-201916426356-A
CountryUS
Kind codeB2
Filing dateMay 30, 2019
Priority dateMay 30, 2019
Publication dateJun 21, 2022
Grant dateJun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a speech dialog system that includes a first microphone, a second microphone, a processor and a memory. The first microphone captures first audio from a first spatial zone, and produces a first audio signal. The second microphone captures second audio from a second spatial zone, and produces a second audio signal. The processor receives the first audio signal and the second audio signal, and the memory contains instructions that control the processor to perform operations of a speech enhancement module, an automatic speech recognition module, and a speech dialog module that performs a zone-dedicated speech dialog.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech dialog system comprising: a first microphone that captures first audio from a first spatial zone, and produces a first audio signal; a second microphone that captures second audio from a second spatial zone, and produces a second audio signal; a processor that receives said first audio signal and said second audio signal; and a memory that contains instructions that control said processor to perform operations of: (a) a speech enhancement (SE) module that: detects and enhances, from said first audio signal and said second audio signal, speech activity in at least one of said first spatial zone or said second spatial zone, thus yielding enhanced processed audio; and determines from which of said first zone or said second zone said enhanced processed audio originated, thus yielding zone activity information, wherein the zone activity information includes, for each signal block of a predetermined amount of time, a first flag for the first spatial zone and a second flag for the second spatial zone indicating whether a respective speaker in the first zone or the second zone is active, and wherein the enhanced processed audio is combined with the first flag for the first spatial zone and the second flag for the second spatial zone of the zone activity information; (b) an automatic speech recognition (ASR) module that: recognizes an utterance in said enhanced processed audio, thus yielding a recognized utterance; and based on said zone activity information combined with the enhanced processed audio, produces a zone decision that identifies from which of said first zone or said second zone said recognized utterance originated; and (c) a speech dialog (SD) module that: performs a zone-dedicated speech dialog based on said recognized utterance and said zone decision. 2. The system of claim 1 , wherein said SD module, based on said recognized utterance and said zone decision, decides from which of said first zone or said second zone to obtain additional audio, thus yielding a routing decision; wherein said ASR module: based on said routing decision, obtains said additional audio from either of said first zone or said second zone; and recognizes an additional utterance in said additional audio. 3. The system of claim 2 , wherein said ASR module is configured of: a first ASR sub-module that during a broad listening mode is enabled to receive and evaluate a mixed audio signal that includes audio from said first spatial zone, and audio from said second spatial zone; a second ASR sub-module that during a selective listening mode for evaluating audio from said first spatial zone, is enabled to receive and evaluate said additional audio from said first spatial zone; and a third ASR sub-module that during a selective listening mode for evaluating audio from said second spatial zone, is enabled to receive and evaluate said additional audio from said second spatial zone. 4. The system of claim 3 , wherein said processor switches from said broad listening mode to said selective listening mode in response to said first ASR module recognizing said utterance. 5. The system of claim 3 , wherein said SE module provides: to said first ASR sub-module, a data stream comprising audio from said first spatial zone and said second spatial zone; to said second ASR sub-module, a data stream of audio from said first spatial zone; and to said third ASR sub-module, a data stream of audio from said second spatial zone. 6. The system of claim 2 , further comprising: a first buffer that stores a most-recent several seconds of said first audio signal, thus yielding buffered first audio; and a second buffer that stores a most-recent several seconds said second audio signal, thus yielding buffered second audio, wherein said ASR module obtains said additional audio by accessing either of said buffered first audio or said buffered second audio, based on said routing decision. 7. The system of claim 1 , wherein said zone activity information is buffered in said ASR module, thus yielding buffered zone activity information, wherein said ASR module generates a detection time window, and wherein said buffered zone activity information is processed on a section thereof that is given by said detection time window.

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title

  • using non-speech characteristics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367437B2 cover?
There is provided a speech dialog system that includes a first microphone, a second microphone, a processor and a memory. The first microphone captures first audio from a first spatial zone, and produces a first audio signal. The second microphone captures second audio from a second spatial zone, and produces a second audio signal. The processor receives the first audio signal and the second au…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).