Reliable reverberation estimation for improved automatic speech recognition in multi-device systems

US10529353B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10529353-B2
Application numberUS-201715837223-A
CountryUS
Kind codeB2
Filing dateDec 11, 2017
Priority dateDec 11, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is described for facilitating multi-device reverberation estimation according to one embodiment. An apparatus of embodiments, as described herein, includes detection and capture logic to facilitate a microphone of a first voice-enabled device of multiple voice-enabled devices to detect a command from a user. The apparatus further includes calculation logic to facilitate a second voice-enabled device and a third voice-enabled device to calculate speech to reverberation modulation energy ratio (SRMR) values based on the command, where the calculation logic us further to estimate reverberation times (RTs) based on the SRMR values. The apparatus further includes decision and application logic to perform dereverberation based on the estimated RTs of the reverberations.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processors to: facilitate a microphone of a first voice-enabled device of multiple voice-enabled devices to detect a command from a user; facilitate a second voice-enabled device and a third voice-enabled device in a multi-device environment to calculate speech to reverberation modulation energy ratio (SRMR) values based on the command; estimate reverberation times (RTs) based on the SRMR values; and perform dereverberation based on the estimated RTs of the reverberations, and recognize the command based on the estimated RTs. 2. The apparatus of claim 1 , wherein the RTs relate to reverberations associated with one or more of the first, second, and third voice-enabled devices, wherein the first, second, and third voice-enable devices are coupled with each other over a communication medium including one or more of a proximity network, a cloud network, and the Internet. 3. The apparatus of claim 1 , wherein the first voice-enabled device is further to convert the command into a text-to-speech (TTS) command, wherein one of the first, second, and third voice-enabled devices serves as a centralized unit positioned locally with the first, second, and third voice-enabled devices or remotely in communication over the communication medium. 4. The apparatus of claim 1 , wherein the one or more processors are further to update one or more SRMR tables based on the calculated SRMR values. 5. The apparatus of claim 1 , wherein the one or more processors are further to select one of the second and third voice-enabled devices to issue a response to the command. 6. The apparatus of claim 1 , wherein a relation between the SRMR values and the RTs is fixed, wherein the first, second, and third voice-enabled devices comprise one or more of smart speakers, laptop computers, mobile devices, smart wearable devices, smart household appliances, and smart locks. 7. The apparatus of claim 1 , wherein each of the first, second, and third voice-enabled devices comprise one or more processors including a graphics processor co-located with an application processor on a common semiconductor package. 8. A method comprising: facilitating a microphone of a first voice-enabled device of multiple voice-enabled devices to detect a command from a user; facilitating a second voice-enabled device and a third voice-enabled device in a multi-device environment to calculate speech to reverberation modulation energy ratio (SRMR) values based on the command; estimating reverberation times (RTs) based on the SRMR values; and performing dereverberation based on the estimated RTs of the reverberations, and recognize the command based on the estimated RTs. 9. The method of claim 8 , wherein the RTs relate to reverberations associated with one or more of the first, second, and third voice-enabled devices, wherein the first, second, and third voice-enable devices are coupled with each other over a communication medium including one or more of a proximity network, a cloud network, and the Internet. 10. The method of claim 8 , wherein the first voice-enabled device is further to convert the command into a text-to-speech (TTS) command, wherein one of the first, second, and third voice-enabled devices serves as a centralized unit positioned locally with the first, second, and third voice-enabled devices or remotely in communication over the communication medium. 11. The method of claim 8 , further comprising updating one or more SRMR tables based on the calculated SRMR values. 12. The method of claim 8 , further comprising selecting one of the second and third voice-enabled devices to issue a response to the command. 13. The method of claim 8 , wherein a relation between the SRMR values and the RTs is fixed, wherein the first, second, and third voice-enabled devices comprise one or more of smart speakers, laptop computers, mobile devices, smart wearable devices, smart household appliances, and smart locks. 14. The method of claim 8 , wherein each of the first, second, and third voice-enabled devices comprise one or more processors including a graphics processor co-located with an application processor on a common semiconductor package. 15. At least one non-transitory machine-readable medium comprising instructions which, when executed by a computing device, cause the computing device to perform operations comprising: facilitating a microphone of a first voice-enabled device of multiple voice-enabled devices to detect a command from a user; facilitating a second voice-enabled device and a third voice-enabled device in a multi-device environment to calculate speech to reverberation modulation energy ratio (SRMR) values based on the command; estimating reverberation times (RTs) based on the SRMR values; and performing dereverberation based on the estimated RTs of the reverberations, and recognize the command based on the estimated RTs. 16. The non-transitory machine-readable medium of claim 15 , wherein the RTs relate to reverberations associated with one or more of the first, second, and third voice-enabled devices, wherein the first, second, and third voice-enable devices are coupled with each other over a communication medium including one or more of a proximity network, a cloud network, and the Internet. 17. The non-transitory machine-readable medium of claim 15 , wherein the first voice-enabled device is further to convert the command into a text-to-speech (TTS) command, wherein one of the first, second, and third voice-enabled devices serves as a centralized unit positioned locally with the first, second, and third voice-enabled devices or remotely in communication over the communication medium. 18. The non-transitory machine-readable medium of claim 15 , further comprising updating one or more SRMR tables based on the calculated SRMR values. 19. The non-transitory machine-readable medium of claim 15 , further comprising selecting one of the second and third voice-enabled devices to issue a response to the command. 20. The non-transitory machine-readable medium of claim 15 , wherein a relation between the SRMR values and the RTs is fixed, wherein the first, second, and third voice-enabled devices comprise one or more of smart speakers, laptop computers, mobile devices, smart wearable devices, smart household appliances, and smart locks, wherein each of the first, second, and third voice-enabled devices comprise one or more processors including a graphics processor co-located with an application processor on a common semiconductor package.

Assignees

Inventors

Classifications

  • the noise being echo, reverberation of the speech · CPC title

  • G10L25/03Primary

    characterised by the type of extracted parameters · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Noise filtering · CPC title

  • using distance or distortion measures between unknown speech and reference templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10529353B2 cover?
A mechanism is described for facilitating multi-device reverberation estimation according to one embodiment. An apparatus of embodiments, as described herein, includes detection and capture logic to facilitate a microphone of a first voice-enabled device of multiple voice-enabled devices to detect a command from a user. The apparatus further includes calculation logic to facilitate a second voi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L25/03. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).