What technology area does this patent fall under?

Primary CPC classification G10L21/0208. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Voice control in a multi-talker and multimedia environment

US11211061B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11211061-B2
Application number	US-201916241327-A
Country	US
Kind code	B2
Filing date	Jan 7, 2019
Priority date	Jan 7, 2019
Publication date	Dec 28, 2021
Grant date	Dec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Voice control in a multi-talker and multimedia environment is disclosed. In one aspect, there is provided a method comprising: receiving a microphone signal for each zone in a plurality of zones of an acoustic environment; generating a processed microphone signal for each zone in the plurality of zones of the acoustic environment, the generating including removing echo caused by audio transducers in the acoustic environment from each of the microphone signals, and removing interference from each of the microphone signals; and performing speech recognition on the processed microphone signals.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of voice control in a multi-talker and multimedia environment, comprising: receiving a microphone signal for each zone in a plurality of zones of an acoustic environment, wherein at least one microphone is located in each zone, wherein the microphone signal from each zone in the plurality of zones is provided by a separate audio channel extending between the at least one microphone located in each zone and an application post processor, wherein one audio channel is provided per zone; removing, by an acoustic echo cancellation module, echo caused by audio transducers in the acoustic environment from the microphone signal of each audio channel to generate an echo cancelled microphone signal for each audio channel; removing, by a zone interference cancellation module, interference from the echo cancelled microphone signal of each audio channel to generate a processed microphone signal for each audio channel; performing speech recognition on the processed microphone signals of each audio channel to generate words from the processed microphone signals; performing keyword spotting on the words generated from the processed microphone signals of each audio channel; and in response to detection of a wake word in the words generated from the processed microphone signals of each audio channel: setting a first zone in the plurality of zones in which the wake word was detected as an active zone; setting an audio channel of the active zone as an active audio channel; initiating an automatic speech recognition session for the active audio channel, wherein during the automatic speech recognition session, speech recognition is only performed on the active audio channel, wherein the speech recognition is performed by the application post processor on the processed microphone signal output from the zone interference cancellation module for the active audio channel; performing natural language processing on results of the speech recognition to determine an action to be performed, wherein both the active zone and the results of speech recognition are used to determine the action to be performed; and performing the determined action. 2. The method of claim 1 , wherein during the automatic speech recognition session, the echo caused by the audio transducers in the acoustic environment from each of the microphone signals is removed from the active audio channel and interference from the microphone signals of other audio channels is removed from the active audio channel. 3. The method of claim 1 , further comprising: during the automatic speech recognition session, providing an audio indication of the active zone. 4. The method of claim 3 , wherein the audio indication comprises decreasing a volume of audio output from one or more speakers in the active zone. 5. The method of claim 3 , wherein the audio indication comprises outputting a speech prompt or sound from one or more speakers in the active zone. 6. The method of claim 1 , further comprising: in response to detection of a sleep word in the words generated from the processed microphone signal of the first zone in the plurality of zones during the automatic speech recognition session, terminating the automatic speech recognition session for the active audio channel. 7. The method of claim 1 , wherein removing interference from each of the microphone signals comprises removing interference speech from speech originating in other zones. 8. The method of claim 7 , wherein removing interference speech caused by speech originating in other zones comprises: using measured signal and noise level differences between the plurality of microphone signals to detect speech of an occupant of a respective zone; for each zone in which speech of an occupant is detected, using an adaptive filter to estimate a speech contribution of the occupant on the microphone signals in other zones; for each microphone signal, removing the estimated speech contribution of occupants in other zones. 9. The method of claim 1 , wherein removing echo caused by audio transducers in the acoustic environment from each of the microphone signals comprises: estimating a plurality of echo paths from each of the plurality of audio transducers to each of the plurality of microphones in the acoustic environment, each microphone being located in and associated with a zone in the plurality of zones of the acoustic environment; and removing echo contributions from each of the plurality of echo paths from the microphone signals. 10. The method of claim 1 , wherein a plurality of microphone signals are received for each zone, wherein generating the processed microphone signal comprises combining the microphone signals of each zone into a composite signal using fixed mixing, dynamic mixing, or beamforming. 11. A system for voice control in a multi-talker and multimedia environment, comprising: a plurality of microphones, each microphone being located in and associated with a zone in a plurality of zones of an acoustic environment; a plurality of speakers, each speaker being located in and associated with a zone in the plurality of zones of the acoustic environment; a processor system comprising one or more processors coupled to the plurality of microphones and the plurality of speakers programmed to: receive a microphone signal for each zone in a plurality of zones of an acoustic environment, wherein at least one microphone is located in each zone, wherein the microphone signal from each zone in the plurality of zones is provided by a separate audio channel extending between the at least one microphone located in each zone and an application post processor, wherein one audio channel is provided per zone; remove, by an acoustic echo cancellation module, echo caused by audio transducers in the acoustic environment from the microphone signal of each audio channel to generate an echo cancelled microphone signal for each audio channel; remove, by a zone interference cancellation module, interference from the echo cancelled microphone signal of each audio channel to generate a processed microphone signal for each audio channel; perform speech recognition on the processed microphone signals of each audio channel to generate words from the processed microphone signals; perform keyword spotting on the words generated from the processed microphone signals of each audio channel; and in response to detection of a wake word in the words generated from the processed microphone signals of each zone in the plurality of zones: set a first zone in the plurality of zones in which the wake word was detected as an active zone; set an audio channel of the active zone as an active audio channel; initiate an automatic speech recognition session for the active audio channel, wherein during the automatic speech recognition session, speech recognition is only performed on the active audio channel, wherein the speech recognition is performed by the application post processor on the processed microphone signal output from the zone interference cancellation module for the active audio channel; perform natural language processing on results of the speech recognition to determine an action to be performed, wherein both the active zone and the results of speech recognition are used to determine the action to be performed; and perform the determined action. 12. A non-transitory machine readable medium having tangibly stored thereon executable instructions for execution by a processor, wherein the executable instructions, when executed by the processor of the electronic device, cause the processor to: receive a microphone signal for each zone in a plurali

Assignees

Ontario Inc 2236008

Inventors

Classifications

G10L15/1815
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
G10L21/0208Primary
Noise filtering · CPC title
G10L2015/088
Word spotting · CPC title
H04R3/005
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
G10L2015/223
Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

View patent family 69055818

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11211061B2 cover?: Voice control in a multi-talker and multimedia environment is disclosed. In one aspect, there is provided a method comprising: receiving a microphone signal for each zone in a plurality of zones of an acoustic environment; generating a processed microphone signal for each zone in the plurality of zones of the acoustic environment, the generating including removing echo caused by audio transduce…
Who is the assignee on this patent?: Ontario Inc 2236008
What technology area does this patent fall under?: Primary CPC classification G10L21/0208. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).