Estimation of speech energy based on code excited linear prediction (CELP) parameters extracted from a partially-decoded CELP-encoded bit stream and applications of same

US9208796B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9208796-B2
Application numberUS-201113214641-A
CountryUS
Kind codeB2
Filing dateAug 22, 2011
Priority dateAug 22, 2011
Publication dateDec 8, 2015
Grant dateDec 8, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and non-transitory computer readable media for estimating speech energy of an encoded bit stream based on coding parameters extracted from the partially-decoded bit stream are disclosed. In an embodiment, a disclosed method includes receiving a CELP-encoded bit stream, partially decoding the bit stream, and estimating the speech energy of the bit stream based a set of four or fewer CELP parameters extracted from the partially decoded bit stream. In another embodiment, a disclosed method includes receiving a CELP-encoded bit stream, partially decoding the bit stream, extracting at least one CELP parameter from the partially decoded bit stream, and estimating the speech energy of the bit stream based on the extracted at least one CELP parameter without calculating a linear prediction coding (LPC) filter response energy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a plurality of encoded bit streams including at least one CELP-encoded bit stream and at least one non-CELP-encoded bit stream; partially decoding the at least one CELP-encoded bit stream in a data processor to provide a partially decoded bit stream; estimating speech energy of the at least one CELP-encoded bit stream based on a set of four or fewer CELP parameters extracted from the partially decoded bit stream in the data processor; estimating speech signal energy of the at least one non-CELP-encoded bit stream by fully decoding the non-CELP-encoded bit stream and determining the speech signal energy of the fully-decoded non-CELP bit stream; using the estimated speech energies to identify bit streams that contain active speech data; and selecting, using the estimated speech energies, at least one bit stream from among bit streams identified as containing active speech data. 2. The method of claim 1 wherein the set of four or fewer CELP parameter comprises four or fewer parameters selected from the set of: a fixed codebook gain; an adaptive codebook gain; a set of linear predictive coding coefficients; a set of reflections coefficients; a fixed codebook index; an adaptive codebook index; and an energy of the excitation signal. 3. The method of claim 1 wherein the set of four or fewer parameters consists of a fixed codebook gain parameter (G.sub.F). 4. The method of claim 3 wherein estimating the speech energy comprises calculating estimated speech energy (E) using the equation E=G.sub.F. 5. The method of claim 3 wherein the set of parameters consists of a fixed codebook gain parameter (G.sub.F) and an adaptive codebook gain parameter (G.sub.A). 6. The method of claim 5 wherein estimating the speech energy comprises, for each frame m, calculating estimated speech energy for the frame (E.sub.F) using the equation E.sub.F(m)=G.sub.F(m)+G.sub.A(m)*E.sub.F(m−1), wherein E.sub.F(m) is the estimated speech energy for the frame, G.sub.F(m) is the fixed codebook gain for the frame, G.sub.A(m) is the adaptive codebook gain for the frame, and E.sub.F(m−1) is the estimated speech energy for the previous frame. 7. The method of claim 1 wherein using the estimated speech energies to identify bit streams that contain active speech data further comprises using the estimated speech energy of the at least one CELP-encoded bit stream to determine, without fully decoding the at least one CELP-encoded bit stream, whether the bit stream contains active speech data. 8. The method of claim 1 further comprising estimating a speech energy of each of a plurality of CELP-encoded bit streams using the set of four or fewer CELP parameters to determine an estimated speech energy of each of the CELP-encoded bit streams without fully decoding the CELP-encoded bit streams, and using the estimated speech energies to identify CELP-encoded bit streams that contain active speech data. 9. The method of claim 1 wherein selecting at least one bit stream from among bit streams identified as containing active speech data comprises selecting bit streams having estimated speech energy higher than a threshold value. 10. The method of claim 1 wherein selecting at least one bit stream from among bit streams identified as containing active speech data comprises selecting bit streams having the highest values of estimated speech energy. 11. The method of claim 1 wherein the at least one selected bit stream is used as an input into a mixer. 12. The method of claim 11 wherein the plurality of encoded bit streams are received at a conference bridge from a plurality of conference participants and wherein the output of the mixer is provided to the plurality of conference participants. 13. The method of claim 1 wherein the fully decoded at least one non-CELP-encoded-bit stream comprises a pulse code modulated (PCM) bit stream and wherein short-term energy values comprise mean square energy values of the PCM bit stream. 14. The method of claim 1 further comprising: calculating in the data processor a moving average energy of an audio level (STA) for a frame of data in the at least one CELP-encoded bit stream; calculating a dynamic noise floor (NF) for the frame; calculating a compensated moving average energy of the audio level (cSTA) for the frame; and calculating a speech energy for the frame based on the cSTA. 15. The method of claim 1 , wherein partially decoding the at least one CELP-encoded bit stream is performed on either a frame-by-frame basis or a sub-frame-by-sub-frame basis and does not require post-processing. 16. A method comprising: receiving a plurality of encoded bit streams including at least one CELP-encoded bit stream and at least one non-CELP-encoded bit stream; partially decoding the at least one CELP-encoded bit stream in a data processor; extracting at least one CELP parameter from the partially decoded at least one CELP-encoded bit stream with the data processor; estimating speech energy of the at least one CELP-encoded bit stream based on the extracted at least one CELP parameter, using the data processor, without calculating a linear prediction coding (LPC) filter response energy; estimating speech signal energy of the at least one non-CELP-encoded bit stream by fully decoding the non-CELP-encoded bit stream and determining the speech signal energy of the fully-decoded non-CELP bit stream; using the estimated speech energies to identify bit streams that contain active speech data; and selecting, using the estimated speech energies, at least one bit stream from among bit streams identified as containing active speech data. 17. The method of claim 16 wherein estimating the speech energy of the at least one CELP-encoded bit stream based on the at least one CELP parameter without calculating a linear prediction coding (LPC) filter response energy comprises extracting LPC coefficients and using the extracted LPC coefficients to reconstruct a frame energy calculation performed during encoding of the at least one CELP-encoded bit stream. 18. The method of claim 16 wherein estimating the speech energy of the at least one CELP-encoded bit stream based on the at least one CELP parameter without calculating a linear prediction coding (LPC) filter response energy comprises extracting LPC coefficients for a frame, using the extracted LPC coefficients to reconstruct a set of autocorrelation coefficients for the frame, and estimating the speech energy of the frame based on at least one autocorrelation coefficient from the set of autocorrelation coefficients for the frame. 19. The method of claim 16 wherein estimating the speech energy of the at least one CELP-encoded bit stream based on the at least one CELP parameter without calculating a linear prediction coding (LPC) filter response energy consists of, for each frame m: extracting a fixed codebook gain parameter G.sub.F(m) for the frame, an adaptive codebook gain parameter G.sub.A(m) for the frame, and a set of LPC coefficients {LPC}(m) for the frame; for each sub-frame n, using G.sub.A(m) and an extracted pitch delay for the sub-frame to calculate an adaptive excitation for the sub-frame v(n), using G.sub.F(m) and an extracted fixed codebook index for the sub-frame to calculate a fixed codebook excitation for the sub-frame c(n), and calculating a sub-frame excitation energy exc(n) using the equation exc( n )= G .sub. F ( m )* c ( n )+ G .sub. A ( m )* v ( n ); calculating frame error power E.sub.ERR(m) as the square root of the sum of the squares o

Assignees

Inventors

Classifications

  • the extracted parameters being power information · CPC title

  • G10L25/06Primary

    the extracted parameters being correlation coefficients · CPC title

  • Multipoint control units therefor · CPC title

  • Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities (video conference systems H04N7/15) · CPC title

  • using the instant speaker's algorithm (speech detection per se G10L25/78) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9208796B2 cover?
Methods, systems, and non-transitory computer readable media for estimating speech energy of an encoded bit stream based on coding parameters extracted from the partially-decoded bit stream are disclosed. In an embodiment, a disclosed method includes receiving a CELP-encoded bit stream, partially decoding the bit stream, and estimating the speech energy of the bit stream based a set of four or …
Who is the assignee on this patent?
Thepie Fapi Emmanuel Rossignol, Poulin Eric, Doyon Jean Pierre, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L25/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 08 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).