System and method for generating an audio signal representing the speech of a user

US9812147B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9812147-B2
Application numberUS-201113988142-A
CountryUS
Kind codeB2
Filing dateNov 17, 2011
Priority dateNov 24, 2010
Publication dateNov 7, 2017
Grant dateNov 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a method of generating a signal representing the speech of a user, the method comprising obtaining a first audio signal representing the speech of the user using a sensor in contact with the user; obtaining a second audio signal using an air conduction sensor, the second audio signal representing the speech of the user and including noise from the environment around the user; detecting periods of speech in the first audio signal; applying a speech enhancement algorithm to the second audio signal to reduce the noise in the second audio signal, the speech enhancement algorithm using the detected periods of speech in the first audio signal; equalizing the first audio signal using the noise-reduced second audio signal to produce an output audio signal representing the speech of the user.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of generating a signal representing the speech of a user, the method comprising: obtaining a first audio signal representing the speech of the user using a sensor in contact with the user; obtaining a second audio signal using an air conduction sensor, the second audio signal representing the speech of the user and including noise from the environment around the user; detecting periods of speech in the first audio signal; applying a speech enhancement algorithm to the second audio signal to reduce the noise in the second audio signal, the speech enhancement algorithm using the detected periods of speech in the first audio signal; equalizing the first audio signal using the noise-reduced second audio signal to produce an output audio signal representing the speech of the user, the equalizing includes performing linear prediction analysis on both the first audio signal and the noise-reduced second audio signal to construct an equalization filter, wherein the performing linear prediction analysis further includes: (i) estimating linear prediction coefficients for both the first audio signal and the noise-reduced second audio signal; (ii) using the linear prediction coefficients for the first audio signal to produce an excitation signal for the first audio signal; (iii) using the linear prediction coefficients for the noise-reduced second audio signal to construct a frequency domain envelope; and (iv) equalizing the excitation signal for the first audio signal using the frequency domain envelope. 2. The method as claimed in claim 1 , wherein detecting periods of speech in the first audio signal comprises detecting parts of the first audio signal where the amplitude of the audio signal is above a threshold value. 3. The method as claimed in claim 1 , wherein applying a speech enhancement algorithm comprises applying spectral processing to the second audio signal. 4. The method as claimed in claim 1 , wherein applying a speech enhancement algorithm to reduce the noise in the second audio signal comprises using the detected periods of speech in the first audio signal to estimate the noise floors in the spectral domain of the second audio signal. 5. The method as claimed in claim 1 , wherein equalizing the first audio signal comprises (i) using long-term spectral methods to construct an equalization filter, or (ii) using the first audio signal as an input to an adaptive filter that minimizes the mean-square error between the filter output and the noise-reduced second audio signal. 6. The method as claimed in claim 1 , wherein prior to the step of equalizing, the method further comprises the step of applying a speech enhancement algorithm to the first audio signal to reduce the noise in the first audio signal, the speech enhancement algorithm making use of the detected periods of speech in the first audio signal, and wherein the step of equalizing comprises equalizing the noise-reduced first audio signal using the noise-reduced second audio signal to produce the output audio signal representing the speech of the user. 7. The method as claimed in claim 1 , further comprising: obtaining a third audio signal using a second air conduction sensor, the third audio signal representing the speech of the user and including noise from the environment around the user; and using a beamforming technique to combine the second audio signal and the third audio signal and produce a combined audio signal; and wherein the step of applying a speech enhancement algorithm comprises applying the speech enhancement algorithm to the combined audio signal to reduce the noise in the combined audio signal, the speech enhancement algorithm using the detected periods of speech in the first audio signal. 8. The method as claimed in claim 1 , further comprising: obtaining a fourth audio signal representing the speech of a user using a second sensor in contact with the user; and using a beamforming technique to combine the first audio signal and the fourth audio signal and produce a second combined audio signal; and wherein the step of detecting periods of speech comprises detecting periods of speech in the second combined audio signal. 9. A non-transitory computer readable medium carrying a computer program for controlling one or more processors to perform the method as claimed in claim 1 . 10. A device for use in generating an audio signal representing the speech of a user, the device comprising: processing circuitry that is configured to: receive a first audio signal representing the speech of the user from a sensor in contact with the user; receive a second audio signal from an air conduction sensor, the second audio signal representing the speech of the user and including noise from the environment around the user; detect periods of speech in the first audio signal; apply a speech enhancement algorithm to the second audio signal to reduce the noise in the second audio signal, the speech enhancement algorithm using the detected periods of speech in the first audio signal; and equalize the first audio signal using the noise-reduced second audio signal to produce an output audio signal representing the speech of the user; wherein the processing circuitry is configured to equalize the first audio signal by performing linear prediction analysis on both the first audio signal and the noise-reduced second audio signal to construct an equalization filter, performing the linear prediction analysis including: (i) estimating linear prediction coefficients for both the first audio signal and the noise reduced second audio signal; (ii) using the linear prediction coefficients for the first audio signal to produce an excitation signal for the first audio signal; (iii) using the linear prediction coefficients for the noise-reduced audio signal to construct a frequency domain envelope; and (iv)equalizing the excitation signal for the first audio signal using the frequency domain envelope. 11. The device as claimed in claim 10 , the device further comprising: a contact sensor that is configured to contact the body of the user when the device is in use and to produce the first audio signal; and an air-conduction sensor that is configured to produce the second audio signal. 12. A device for generating an audio signal representing the speech of a user, the device comprising: a processor configured to: receive a first audio signal representing the speech of the user from a sensor in contact with the user; receive a second audio signal representing the speech of the user including noise from an environment around the user; detect periods of speech in the first audio signal; apply a speech enhancement algorithm to the second audio signal to reduce the noise in the second audio signal; and equalize the first audio signal using the noise-reduced second audio signal to produce and output an audio signal representing the speech of the user, the equalizing including: (i) estimate linear prediction coefficients for both the first audio signal and the noise reduced second audio signal; (ii) use the linear prediction coefficients for the first audio signal to produce an excitation signal for the first audio signal; and (iii) use the linear prediction coefficients for the noise-reduced audio signal to construct a frequency domain envelope; and (iv) equalize the excitation signal for the first audio signal using the frequency domain envelope. 13. The device as claimed in claim 12 , wherein the processor is further configured to: perform linear prediction analysis on the first audio signal and the second audio signal to construct an equal

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9812147B2 cover?
There is provided a method of generating a signal representing the speech of a user, the method comprising obtaining a first audio signal representing the speech of the user using a sensor in contact with the user; obtaining a second audio signal using an air conduction sensor, the second audio signal representing the speech of the user and including noise from the environment around the user; …
Who is the assignee on this patent?
Kechichian Patrick, Van Den Dungen Wilhelmus Andreas Martinus Arnoldus Maria, Koninklijke Philips Nv
What technology area does this patent fall under?
Primary CPC classification G10L21/0208. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).