Language models using spoken language modeling
US-2024386885-A1 · Nov 21, 2024 · US
US2016155456A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016155456-A1 |
| Application number | US-201615017075-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 5, 2016 |
| Priority date | Aug 6, 2013 |
| Publication date | Jun 2, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An audio signal classification method and apparatus, where the method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.
Opening claim text (preview).
What is claimed is: 1 . An audio signal classification method, comprising: determining, according to voice activity of a current audio frame, whether to obtain a current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter, wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal; updating, according to whether the audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters; and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the stored frequency spectrum fluctuation parameters. 2 . The method according to claim 1 , wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame. 3 . The method according to claim 1 , wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and the current audio frame does not belong to an energy attack. 4 . The method according to claim 1 , wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and none of multiple consecutive frames comprising the current audio frame and a historical frame of the current audio frame belongs to an energy attack. 5 . The method according to claim 1 , wherein updating, according to whether the current audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters comprises modifying values of the stored frequency spectrum fluctuation parameters when the current audio frame belongs to percussive music. 6 . The method according to claim 1 , wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises: obtaining an average value of the part or all of the effective data of the stored frequency spectrum fluctuation parameters; and classifying the current audio frame as the music frame when the obtained average value satisfies a music classification condition. 7 . The method according to claim 1 , further comprising: obtaining a frequency spectrum high-frequency-band peakiness parameter, a frequency spectrum correlation degree parameter, and a linear prediction residual energy tilt parameter of the current audio frame, wherein the frequency spectrum high-frequency-band peakiness parameter denotes a peakiness or an energy acutance, on a high frequency band, of a frequency spectrum of the current audio frame, wherein the frequency spectrum correlation degree parameter denotes stability, between adjacent frames, of a signal harmonic structure of the current audio frame, and wherein the linear prediction residual energy tilt parameter denotes an extent to which linear prediction residual energy of the audio signal changes as a linear prediction order increases; determining, according to the voice activity of the current audio frame, whether to store the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter, and wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises: obtaining an average value of the part or all of effective data of the stored frequency spectrum fluctuation parameters, an average value of a part or all of effective data of stored frequency spectrum high-frequency-band peakiness parameters, an average value of a part or all of effective data of stored frequency spectrum correlation degrees parameters, and a variance of a part or all of effective data of stored linear prediction residual energy tilt parameters separately; and classifying the current audio frame as the music frame when a music classifying condition comprising one of the following conditions is satisfied: the average value of the effective data of the stored frequency spectrum fluctuation parameters is less than a first threshold; the average value of the effective data of the stored frequency spectrum high-frequency-band peakiness parameters is greater than a second threshold; the average value of the effective data of the stored frequency spectrum correlation degree parameters is greater than a third threshold; and the variance of the effective data of the stored linear prediction residual energy tilt parameters is less than a fourth threshold. 8 . The method according to claim 7 , wherein the music classifying condition further comprises a voicing_cnt, wherein the voicing_cnt is less than a fifth threshold, and wherein the voicing_cnt denotes a quantity of voicing parameters whose values are greater than a sixth threshold in a voicing historical buffer which is used to store a voicing parameter of the current audio frame when the voicing parameter of the current audio frame is needed to be obtained and stored. 9 . The method according to claim 1 , wherein the stored one or more frequency spectrum fluctuation parameters are stored in a frequency spectrum fluctuation buffer when the current frequency spectrum fluctuation parameter is determined to be obtained and stored, and wherein the current frequency spectrum fluctuation parameter is stored to the frequency spectrum fluctuation buffer. 10 . An audio signal classification method, comprising: determining, according to voice activity of a current audio frame, whether to obtain a current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter, wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal; updating, according to activity of a historical audio frame, stored one or more frequency spectrum fluctuation parameters; and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the stored frequency spectrum fluctuation parameters. 11 . The method according to claim 10 , wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame. 12 . The method according to claim 10 , wherein determining, according to voice activity of the current audio frame, whether to obtain the c
the extracted parameters being prediction coefficients · CPC title
Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
for discriminating voice from music · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.