Voice activity detection/silence suppression system
US-9224405-B2 · Dec 29, 2015 · US
US9502046B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9502046-B2 |
| Application number | US-201314421132-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 20, 2013 |
| Priority date | Sep 21, 2012 |
| Publication date | Nov 22, 2016 |
| Grant date | Nov 22, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for encoding sound field signals includes allocating coding rate by application of a uniform criterion to all subbands of all signals in a joint process. An allocation criterion may be based on a comparison, in a given subband, between a spectral envelope of the signals to be encoded and a coding noise profile, wherein the noise profile may be a sum of a noise shape and a noise offset, which noise offset is computed on the basis of the coding bit budget. The rate allocation process may be combined with an energy-compacting orthogonal transform, for which there is proposed a parameterization susceptible of efficient coding and having adjustable directivity. In a further aspect, the invention provides a corresponding decoding method.
Opening claim text (preview).
What is claimed is: 1. An adaptive audio encoding device, comprising: a spatial analyzer configured to receive a plurality of audio signals and to determine, based on the plurality of audio signals, frame-wise decomposition parameters; an adaptive rotation stage configured to receive said plurality of audio signals and to output at least a first, second, and third rotated audio signal obtained by an energy-compacting orthogonal transformation, wherein quantitative properties of the transformation are determined by the decomposition parameters; a spectral envelope analyzer configured to receive a frequency-domain representation of the rotated audio signals, which contains transform coefficients, and to output, based thereon, a spectral envelope; and a multichannel encoder configured to receive the frequency-domain representation of the rotated audio signals and to output transform coefficients of the first rotated audio signal only for frequency subbands in a first subband collection, transform coefficients of the second rotated audio signal only for frequency subbands in a second subband collection, and transform coefficients of the third rotated audio signal only for frequency subbands in a third subband collection, wherein any subbands not included in any subband collection are to be synthesized at decoding, wherein the multichannel encoder determines the first subband collection, the second subband collection, and the third subband collection by means of a rate allocation process based on a joint comparison of a noise profile for the rotated audio signals and the spectral envelopes of the rotated audio signals, wherein at least one of the spatial analyser, the adaptive rotation stage, the spectral envelope analyser, and the multichannel encoder, are implemented, at least in part, by one or more hardware elements of the adaptive audio encoding device. 2. The system of claim 1 , wherein: the noise profile is based on a sum of a noise shape, which is fixed for a given frame, and a noise offset common to all rotated audio signals; and the rate allocation process includes varying the noise offset to approximate a coding bit budget. 3. The system of claim 2 , wherein the rate allocation process includes computing a rate expense for a candidate noise offset on the basis of the transform coefficients of the rotated audio signals. 4. The system of claim 2 , further comprising a noise level computing section configured to determine said noise shape on the basis of the spectral envelopes of the rotated audio signals, wherein the noise shape is allowed to vary on a subband basis. 5. The system of claim 1 , further comprising a rescaling section configured to rescale the spectral envelope in such manner as to make it perceptually comparable to a noise profile, which is constant with respect to frequency. 6. The system of claim 1 , wherein the spatial analyzer comprises a decomposition parameter encoder and is configured to supply the decomposition parameters to the adaptive rotation stage in quantized form. 7. The system of claim 1 , wherein the spatial analyzer is configured to determine the decomposition parameters on the basis of an analysis frequency subrange of said plurality of audio signals. 8. The system of claim 1 , further comprising a time-to-frequency transform stage arranged upstream of the spatial analyzer, said time-to-frequency transform stage being configured to receive a time-domain representation of at least one signal and to output, based thereon, a frequency-domain representation of the at least one signal. 9. The system of claim 8 , wherein the time-to-frequency transform stage is arranged immediately upstream of the spatial analyzer. 10. The system of claim 8 , wherein the adaptive rotation stage is configured to apply said orthogonal decomposition to a decomposition frequency subrange of said plurality of audio signals, said system further comprising a combining section configured to form the rotated audio signals by concatenating the result of the orthogonal transformation in the decomposition subrange and said plurality of audio signals outside the decomposition subrange. 11. The system of claim 1 , further comprising a time-invariant pre-conditioning stage configured to output said plurality of audio signals based on an equal number of input audio signals. 12. The system of claim 11 , wherein the input audio signals are obtainable by three angularly distributed directive transducers and the pre-conditioning stage is configured to output a linear combination of the input audio signals with coefficients proportional to the elements of the following matrix: P ( h ) = 1 3 [ 2 h 2 h 2 h 2 2 - 4 2 3 - 2 3 0 ] , where h is a finite positive constant. 13. An adaptive audio encoding method, comprising: determining frame-wise decomposition parameters on the basis of a plurality of audio signals; rotating the audio signals into at least a first, second, and third rotated audio signal using an energy-compacting orthogonal transformation, wherein quantitative properties of the transformation are determined by the decomposition parameters; computing a spectral envelope based on a frequency-domain representation of the rotated audio signals, which frequency-domain representation contains transform coefficients; determin
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
characterised by the method used for estimating noise · CPC title
Comfort noise or silence coding · CPC title
Mode decision, i.e. based on audio signal content versus external parameters · CPC title
Noise filtering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.