Linear prediction residual energy tilt-based audio signal classification method and apparatus

US11289113B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11289113-B2
Application numberUS-201916723584-A
CountryUS
Kind codeB2
Filing dateDec 20, 2019
Priority dateAug 6, 2013
Publication dateMar 29, 2022
Grant dateMar 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A linear prediction residual energy tilt-based audio signal classification method and apparatus, where the method includes: determining, according to voice activity of a current audio frame, whether to obtain a linear prediction residual energy tilt of a current audio frame of the current audio frame and store a frequency spectrum fluctuation of the current frame in a frequency spectrum fluctuation memory, where the linear prediction residual energy tilt denotes an extent to which an audio signal's linear prediction residual energy changes as a linear prediction order inscreases; updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory; and classifying the current audio frame as a speech frame or a music frame according to statistics of some or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.

First claim

Opening claim text (preview).

What is claimed is: 1. An audio signal classification method, comprising: performing frame division processing on an input audio signal; obtaining a linear prediction residual energy tilt of a current audio frame of the input audio signal, wherein the linear prediction residual energy tilt denotes an extent to which linear prediction residual energy of the input audio signal changes as a linear prediction order increases; determining whether to store the linear prediction residual energy tilt in a memory according to voice activity of the current audio frame; storing the linear prediction residual energy tilt in the memory in response to determining that the linear prediction residual energy tilt needs to be stored according to the voice activity of the current audio frame; and classifying the current audio frame according to statistics of prediction residual energy tilts in the memory. 2. The audio signal classification method according to claim 1 , wherein the statistics of the prediction residual energy tilts is a variance of the prediction residual energy tilts, and wherein classifying the current audio frame according to the statistics of the prediction residual energy tilts in the memory comprises: comparing the variance of the prediction residual energy tilts with a music classification threshold; and classifying the current audio frame as a music frame when the variance of the prediction residual energy tilts is less than the music classification threshold. 3. The audio signal classification method according to claim 1 , wherein the statistics of the prediction residual energy tilts is a variance of the prediction residual energy tilts, and wherein classifying the current audio frame according to the statistics of the prediction residual energy tilts in the memory comprises: comparing the variance of the prediction residual energy tilts with a music classification threshold; and classifying the current audio frame as a speech frame when the variance of the prediction residual energy tilts is greater than or equal to the music classification threshold. 4. The audio signal classification method according to claim 1 , further comprising: obtaining a frequency spectrum fluctuation, a frequency spectrum high-frequency-band peakiness, and a frequency spectrum correlation degree of the current audio frame; and storing the frequency spectrum fluctuation, the frequency spectrum high-frequency-band peakiness, and the frequency spectrum correlation degree in corresponding memories, wherein classifying the current audio frame according to the statistics of the prediction residual energy tilts in the memory comprises: obtaining statistics of effective data of the frequency spectrum fluctuation, statistics of effective data of the frequency spectrum high-frequency-band peakiness, statistics of effective data of the frequency spectrum correlation degree, and statistics of effective data of the linear prediction residual energy tilt; and classifying the current audio frame as a speech frame or a music frame according to statistics of effective data, wherein each statistics of the effective data is a data value. 5. The audio signal classification method according to claim 4 , wherein the obtaining the statistics of the effective data of the frequency spectrum fluctuation, the statistics of the effective data of the frequency spectrum high-frequency-band peakiness, the statistics of the effective data of the frequency spectrum correlation degree, and the statistics of the effective data of the linear prediction residual energy tilt, and classifying the audio current frame as a speech frame or a music frame according to the statistics of the effective data comprises: obtaining an average value of the effective data of the frequency spectrum fluctuation, an average value of the effective data of the frequency spectrum high-frequency-band peakiness, an average value of the effective data of the frequency spectrum correlation degree, and a variance of the effective data of the linear prediction residual energy tilt separately; and classifying the current audio frame as the music frame when one of the following conditions is satisfied: the average value of the effective data of the frequency spectrum fluctuation is less than a first threshold, the average value of the effective data of the frequency spectrum high-frequency-band peakiness is greater than a second threshold, the average value of the effective data of the frequency spectrum correlation degree is greater than a third threshold, and the variance of the effective data of the linear prediction residual energy tilt is less than a fourth threshold. 6. The audio signal classification method according to claim 4 , wherein the obtaining the statistics of the effective data of the frequency spectrum fluctuation, the statistics of the effective data of the frequency spectrum high-frequency-band peakiness, the statistics of the effective data of the frequency spectrum correlation degree, and the statistics of the effective data of the linear prediction residual energy tilt, and classifying the audio current frame as a speech frame or a music frame according to the statistics of the effective data comprises: obtaining an average value of the effective data of the frequency spectrum fluctuation, an average value of the effective data of the frequency spectrum high-frequency-band peakiness, an average value of the effective data of the frequency spectrum correlation degree, and a variance of the effective data of the linear prediction residual energy tilt separately; and classifying the current audio frame as the speech frame when none of the following conditions are satisfied: the average value of the effective data of the frequency spectrum fluctuation is less than a first threshold, the average value of the effective data of the frequency spectrum high-frequency-band peakiness is greater than a second threshold, the average value of the effective data of the frequency spectrum correlation degree is greater than a third threshold, and the variance of the effective data of the linear prediction residual energy tilt is less than a fourth threshold. 7. The audio signal classification method according to claim 1 , further comprising: obtaining a frequency spectrum tone quantity of the current audio frame and a ratio of the frequency spectrum tone quantity on a low frequency band; and storing the frequency spectrum tone quantity and the ratio of the frequency spectrum tone quantity on the low frequency band in corresponding memories, wherein the classifying the current audio frame according to the statistics of the prediction residual energy tilts in the memory comprises: obtaining statistics of the linear prediction residual energy tilt and statistics of the frequency spectrum tone quantity separately; and classifying the current audio frame as a speech frame or a music frame according to the statistics of the linear prediction residual energy tilt, the statistics of the frequency spectrum tone quantity, and the ratio of the frequency spectrum tone quantity on the low frequency band, wherein each of the statistics refers to a data value obtained after a calculation operation is performed on data stored in the memories. 8. The audio signal classification method according to claim 7 , wherein obtaining the statistics of the linear prediction residual energy tilt and the statistics of the frequency spectrum tone quantity separately comprises: obtaining a variance of the linear prediction residual energy tilt; and obtaining an average value of the frequency spectrum tone quantity, and wherein classifying the current audio frame as the speech frame or music frame according to the data val

Assignees

Inventors

Classifications

  • G10L25/81Primary

    for discriminating voice from music · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • the extracted parameters being prediction coefficients · CPC title

  • Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title

  • G10L25/18Primary

    the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11289113B2 cover?
A linear prediction residual energy tilt-based audio signal classification method and apparatus, where the method includes: determining, according to voice activity of a current audio frame, whether to obtain a linear prediction residual energy tilt of a current audio frame of the current audio frame and store a frequency spectrum fluctuation of the current frame in a frequency spectrum fluctua…
Who is the assignee on this patent?
Huawei Tech Co Ltd, Huawei Technolgies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L25/81. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).