Vas toggle based on device orientation
US-2021118439-A1 · Apr 22, 2021 · US
US11211061B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11211061-B2 |
| Application number | US-201916241327-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 7, 2019 |
| Priority date | Jan 7, 2019 |
| Publication date | Dec 28, 2021 |
| Grant date | Dec 28, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Voice control in a multi-talker and multimedia environment is disclosed. In one aspect, there is provided a method comprising: receiving a microphone signal for each zone in a plurality of zones of an acoustic environment; generating a processed microphone signal for each zone in the plurality of zones of the acoustic environment, the generating including removing echo caused by audio transducers in the acoustic environment from each of the microphone signals, and removing interference from each of the microphone signals; and performing speech recognition on the processed microphone signals.
Opening claim text (preview).
The invention claimed is: 1. A method of voice control in a multi-talker and multimedia environment, comprising: receiving a microphone signal for each zone in a plurality of zones of an acoustic environment, wherein at least one microphone is located in each zone, wherein the microphone signal from each zone in the plurality of zones is provided by a separate audio channel extending between the at least one microphone located in each zone and an application post processor, wherein one audio channel is provided per zone; removing, by an acoustic echo cancellation module, echo caused by audio transducers in the acoustic environment from the microphone signal of each audio channel to generate an echo cancelled microphone signal for each audio channel; removing, by a zone interference cancellation module, interference from the echo cancelled microphone signal of each audio channel to generate a processed microphone signal for each audio channel; performing speech recognition on the processed microphone signals of each audio channel to generate words from the processed microphone signals; performing keyword spotting on the words generated from the processed microphone signals of each audio channel; and in response to detection of a wake word in the words generated from the processed microphone signals of each audio channel: setting a first zone in the plurality of zones in which the wake word was detected as an active zone; setting an audio channel of the active zone as an active audio channel; initiating an automatic speech recognition session for the active audio channel, wherein during the automatic speech recognition session, speech recognition is only performed on the active audio channel, wherein the speech recognition is performed by the application post processor on the processed microphone signal output from the zone interference cancellation module for the active audio channel; performing natural language processing on results of the speech recognition to determine an action to be performed, wherein both the active zone and the results of speech recognition are used to determine the action to be performed; and performing the determined action. 2. The method of claim 1 , wherein during the automatic speech recognition session, the echo caused by the audio transducers in the acoustic environment from each of the microphone signals is removed from the active audio channel and interference from the microphone signals of other audio channels is removed from the active audio channel. 3. The method of claim 1 , further comprising: during the automatic speech recognition session, providing an audio indication of the active zone. 4. The method of claim 3 , wherein the audio indication comprises decreasing a volume of audio output from one or more speakers in the active zone. 5. The method of claim 3 , wherein the audio indication comprises outputting a speech prompt or sound from one or more speakers in the active zone. 6. The method of claim 1 , further comprising: in response to detection of a sleep word in the words generated from the processed microphone signal of the first zone in the plurality of zones during the automatic speech recognition session, terminating the automatic speech recognition session for the active audio channel. 7. The method of claim 1 , wherein removing interference from each of the microphone signals comprises removing interference speech from speech originating in other zones. 8. The method of claim 7 , wherein removing interference speech caused by speech originating in other zones comprises: using measured signal and noise level differences between the plurality of microphone signals to detect speech of an occupant of a respective zone; for each zone in which speech of an occupant is detected, using an adaptive filter to estimate a speech contribution of the occupant on the microphone signals in other zones; for each microphone signal, removing the estimated speech contribution of occupants in other zones. 9. The method of claim 1 , wherein removing echo caused by audio transducers in the acoustic environment from each of the microphone signals comprises: estimating a plurality of echo paths from each of the plurality of audio transducers to each of the plurality of microphones in the acoustic environment, each microphone being located in and associated with a zone in the plurality of zones of the acoustic environment; and removing echo contributions from each of the plurality of echo paths from the microphone signals. 10. The method of claim 1 , wherein a plurality of microphone signals are received for each zone, wherein generating the processed microphone signal comprises combining the microphone signals of each zone into a composite signal using fixed mixing, dynamic mixing, or beamforming. 11. A system for voice control in a multi-talker and multimedia environment, comprising: a plurality of microphones, each microphone being located in and associated with a zone in a plurality of zones of an acoustic environment; a plurality of speakers, each speaker being located in and associated with a zone in the plurality of zones of the acoustic environment; a processor system comprising one or more processors coupled to the plurality of microphones and the plurality of speakers programmed to: receive a microphone signal for each zone in a plurality of zones of an acoustic environment, wherein at least one microphone is located in each zone, wherein the microphone signal from each zone in the plurality of zones is provided by a separate audio channel extending between the at least one microphone located in each zone and an application post processor, wherein one audio channel is provided per zone; remove, by an acoustic echo cancellation module, echo caused by audio transducers in the acoustic environment from the microphone signal of each audio channel to generate an echo cancelled microphone signal for each audio channel; remove, by a zone interference cancellation module, interference from the echo cancelled microphone signal of each audio channel to generate a processed microphone signal for each audio channel; perform speech recognition on the processed microphone signals of each audio channel to generate words from the processed microphone signals; perform keyword spotting on the words generated from the processed microphone signals of each audio channel; and in response to detection of a wake word in the words generated from the processed microphone signals of each zone in the plurality of zones: set a first zone in the plurality of zones in which the wake word was detected as an active zone; set an audio channel of the active zone as an active audio channel; initiate an automatic speech recognition session for the active audio channel, wherein during the automatic speech recognition session, speech recognition is only performed on the active audio channel, wherein the speech recognition is performed by the application post processor on the processed microphone signal output from the zone interference cancellation module for the active audio channel; perform natural language processing on results of the speech recognition to determine an action to be performed, wherein both the active zone and the results of speech recognition are used to determine the action to be performed; and perform the determined action. 12. A non-transitory machine readable medium having tangibly stored thereon executable instructions for execution by a processor, wherein the executable instructions, when executed by the processor of the electronic device, cause the processor to: receive a microphone signal for each zone in a plurali
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Noise filtering · CPC title
Word spotting · CPC title
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
Execution procedure of a spoken command · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.