Voice quality enhancement techniques, speech recognition techniques, and related systems

US9633671B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9633671-B2
Application numberUS-201414517700-A
CountryUS
Kind codeB2
Filing dateOct 17, 2014
Priority dateOct 18, 2013
Publication dateApr 25, 2017
Grant dateApr 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An echo canceller can be arranged to receive an input signal and to receive a reference signal. The echo canceller can subtract a linear component of the reference signal from the input signal. A noise suppressor can suppress non-linear effects of the reference signal in the input signal in correspondence with a large number of selectable parameters. Such suppression can be provided on a frequency-by-frequency basis, with a unique set of tunable parameters selected for each frequency. A degree of suppression provided by the noise suppressor can correspond to an estimate of residual echo remaining after the one or more linear components of the reference signal have been subtracted from the input signal, to an estimated double-talk probability, and to an estimated signal-to-noise ratio of near-end speech in the input signal for each respective frequency. A speech recognizer can receive a processed input signal from the noise suppressor.

First claim

Opening claim text (preview).

We currently claim: 1. A media device having a loudspeaker, a microphone, and a digital signal processor comprising: an echo canceller arranged to receive an input signal from the microphone and to receive a reference signal, wherein the echo canceller is configured to subtract one or more linear components of the reference signal from the input signal; a noise suppressor configured to suppress non-linear effects of the reference signal in each of a plurality of frequency bins in the input signal, wherein a degree of suppression provided by the noise suppressor for each frequency bin corresponds to an estimate of residual echo remaining in the respective frequency bin after the one or more linear components of the reference signal have been subtracted from the input signal, to an estimated double-talk probability, and to an estimated signal-to-noise ratio of near-end speech in the input signal for each respective frequency bin; and a speech recognizer arranged to receive a processed input signal from the noise suppressor and to recognize an utterance from the processed input signal, wherein the media device is configured to be controlled responsively to the cognized utterance. 2. A media device according to claim 1 , wherein the noise suppressor applies a spectral gain or a binary mask to the input signal in correspondence with the estimated signal-to-noise ratio of near-end speech in the input signal. 3. A media device according to claim 1 , wherein the noise suppressor applies a selected one of a spectral gain and a binary mask to the input signal responsive to the estimated signal-to-noise ratio of near-end speech in the input signal exceeding a predefined signal-to-noise threshold. 4. A media device according to claim 3 , wherein the predefined signal-to-noise threshold comprises a first signal-to-noise threshold corresponding to a first frequency bin of the input signal, the spectral gain or the binary mask comprises a first spectral gain or a first binary mask, respectively, and the noise suppressor applies the selected one of the first spectral gain and the first binary mask to the first frequency bin of the input signal responsive to the estimated signal-to-noise ratio of near-end speech in the first frequency bin exceeding the first signal-to-noise threshold, and wherein the noise suppressor applies a selected one of a second spectral gain and a second binary mask to a second frequency bin of the input signal in correspondence with an estimated signal-to-noise ratio of near-end speech in the second frequency bin of the input signal. 5. A media device according to claim 1 , wherein the noise suppressor applies a selected one of a spectral gain and a binary mask to the input signal responsive to the estimated signal-to-noise ratio of near-end speech in the input signal falling below a predefined signal-to-noise threshold. 6. A media device according to claim 5 , wherein the predefined signal-to-noise threshold comprises a first signal-to-noise threshold corresponding to a first frequency bin of the input signal, the spectral gain or the binary mask comprises a first spectral gain or a first binary mask, respectively, and the noise suppressor applies the selected one of the first spectral gain and the first binary mask to the first frequency bin of the input signal responsive to the estimated signal-to-noise ratio of near-end speech in the first frequency bin falling below the first signal-to-noise threshold, and wherein the noise suppressor applies a selected one of a second spectral gain and a second binary mask to a second frequency bin of the input signal in correspondence with an estimated signal-to-noise ratio of near-end speech in the second frequency bin of the input signal. 7. Tangible, non-transitory computer-readable media including instructions that, when executed, cause a computing environment to implement a method comprising: subtracting linear components of a reference signal from an input signal emitted from a microphone; estimating one or more of a residual echo remaining in the input signal after the act of subtracting linear components, a double-talk probability, and a signal-to-noise ratio of near-end speech in the input signal; suppressing non-linear effects of the reference signal in each of a plurality of frequency bins of the input signal in correspondence with the estimated one or more of the estimated residual echo, the estimated double-talk probability, and the estimated signal-to-noise ratio for each respective frequency bin; recognizing near-end speech in the input signal after suppressing non-linear effects of the reference signal in the plurality of frequency bins of the input signal: controlling a computing environment responsively to the recognized near-end speech. 8. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of estimating the double-talk probability comprises comparing an estimated echo signal to the input signal from the microphone for each of the frequency bins. 9. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of estimating residual echo comprises determining a coherence of the input signal to a signal representative of the reference signal when the double-talk probability exceeds a selected threshold probability for each of the frequency bins. 10. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of estimating residual echo comprises determining a coherence of an error signal from an echo canceller to a signal representative of the input signal when the double-talk probability falls below a selected threshold probability for each of the frequency bins. 11. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of suppressing non-linear effects of the reference signal in the input signal comprises applying a spectral gain or a binary mask to the input signal in correspondence with the estimated signal-to-noise ratio of near-end speech in the input signal for each respective frequency bin. 12. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of suppressing non-linear effects comprises applying a selected one of a spectral gain and a binary mask to the input signal responsive to an estimated signal-to-noise ratio of near-end speech in the input signal exceeding a predefined signal-to-noise threshold. 13. Tangible, non-transitory computer-readable media according to claim 12 , wherein the predefined signal-to-noise threshold comprises a first signal-to-noise threshold corresponding to a first frequency bin of the input signal, the spectral gain or the binary mask comprises a corresponding first spectral gain or first binary mask, respectively, and the act of applying a selected one of a spectral gain and a binary mask to the input signal comprises applying the selected one of the first spectral gain and the first binary mask to the first frequency bin of the input signal responsive to the estimated signal-to-noise ratio of near-end speech in the first frequency bin exceeding the first signal-to-noise threshold, and applying a selected one of a second spectral gain and a second binary mask to a second frequency bin of the input signal in correspondence with an estimated signal-to-noise ratio of near-end speech in the second frequency bin of the input signal. 14. Tangible, non-transitory computer-readable media according to claim 7 , wherein the act of suppressing non-linear effects comprises applying a selected one of a spectral gain and a binary mask to the input signal responsive to the estimated signal-to-noise rati

Assignees

Inventors

Classifications

  • Noise filtering · CPC title

  • using echo cancellers (echo cancellers per se H04B3/23) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9633671B2 cover?
An echo canceller can be arranged to receive an input signal and to receive a reference signal. The echo canceller can subtract a linear component of the reference signal from the input signal. A noise suppressor can suppress non-linear effects of the reference signal in the input signal in correspondence with a large number of selectable parameters. Such suppression can be provided on a freque…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/0208. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).