Generative neural network model for processing audio samples in a filter-bank domain

US12579991B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12579991-B2
Application numberUS-202118248808-A
CountryUS
Kind codeB2
Filing dateOct 15, 2021
Priority dateOct 16, 2020
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network system is provided, implementing a generative model for autoregressively generating a distribution for a plurality of current filter-bank samples of an audio signal, wherein the current samples correspond to a current time slot, and each current sample corresponds to a channel of the filter-bank. The system includes a hierarchy of a plurality of neural network processing tiers ordered from a top to a bottom tier, each tier trained to generate conditioning information based on previous filter-bank samples and, for at least each tier but the top tier, also on the conditioning information from a tier higher up in the hierarchy, and an output stage trained to generate the probability distribution based on previous samples for one or more previous time slots and the conditioning information from the lowest processing tier.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer implemented neural network system for autoregressively generating a plurality of current filter-bank samples of a filter-bank representation of an audio signal, wherein the current filter-bank samples correspond to a current time slot, and wherein each current filter-bank sample corresponds to a respective channel of the filter-bank, including: a hierarchy of a plurality of neural network processing tiers ordered from a top processing tier to a bottom processing tier, wherein each processing tier has been trained to generate conditioning information based on previous filter-bank samples of the filter-bank representation and, for at least each processing tier but the top tier, also on the conditioning information generated by a processing tier higher up in the hierarchy, and an output stage that has been trained to generate a probability distribution for said plurality of current filter-bank samples based on previous filter-bank samples corresponding to one or more previous time slots of the filter-bank representation and the conditioning information generated from the lowest processing tier, said output stage being configured to sample the probability distribution to obtain said plurality of current filter bank samples, wherein the output stage includes the bottom processing tier, and wherein the bottom processing tier is subdivided into a plurality of sequentially executed sub-layers, wherein each sub-layer has been trained to generate the probability distribution for one or more current filter-bank samples corresponding to a true subset of the channels of the filter-bank and, at least for all but a first executed sub-layer, each sub-layer has been trained to generate the probability distribution also based on current filter-bank samples generated by one or more previously executed sub-layers. 2 . The system of claim 1 , where each processing tier has been trained to generate the conditioning information also based on additional side information provided for the current time slot. 3 . The system of claim 1 , further including means configured for generating the plurality of current filter-bank samples of the filter-bank representation by sampling from the probability distribution. 4 . They system of claim 3 , wherein the probability distribution for the current filter-bank samples is obtained using a mixture model. 5 . The system of claim 4 , wherein generating the probability distribution includes providing an update of a linear transformation for a mixture coefficient of the mixture model, wherein the linear transformation is defined by a triangular matrix with ones on its main diagonal, and wherein the triangular matrix has a number of non-zero diagonals greater than one and smaller than the number of channels of the filter-bank. 6 . The system of claim 1 , wherein each processing tier includes convolutional modules configured for receiving the previous filter-bank samples of the filter-bank representation, wherein each convolutional module has a same number of input channels as a number of channels of the filter-bank, and wherein kernel sizes of the convolutional modules decrease from the top processing tier to the bottom processing tier in the hierarchy. 7 . The system of claim 6 , wherein each processing tier includes at least one recurrent unit configured for receiving as its input a sum of the outputs from the convolutional modules, and, for at least each processing tier but the lowest processing tier, at least one learned upsampling module configured to receive as its input an output from the at least one recurrent unit and to generate as its output the conditioning information. 8 . The system of claim 7 , further including an additional recurrent unit common to all sub-layers of the bottom processing tier and configured for receiving as its input a mix of i) the sum of the outputs from the convolutional modules and ii) the output of the at least one recurrent unit, and to based thereon generate additional side information to a respective sub-output stage of each sub-layer. 9 . The system of claim 1 , wherein the first executed sub-layer generates one or more current filter-bank samples corresponding to at least the lowest channel of the filter-bank, and wherein the last executed sub-layer generates one or more current filter-bank samples corresponding to at least the highest channel of the filter-bank. 10 . The system of claim 1 , wherein the probability distribution for the current filter-bank samples is obtained using a mixture model. 11 . The system of claim 10 , wherein generating the probability distribution includes providing an update of a linear transformation for a mixture coefficient of the mixture model, wherein the linear transformation is defined by a triangular matrix with ones on its main diagonal, and wherein the triangular matrix has a number of non-zero diagonals greater than one and smaller than the number of channels of the filter-bank. 12 . The system of claim 5 , wherein the sampling includes a transformation with the linear transformation. 13 . A non-transitory computer readable medium storing instructions operable, when executed by at least one computer processor belonging to a computer hardware, to implement the system according to claim 1 using said computer hardware. 14 . A computer implemented neural network system for autoregressively generating a plurality of current filter-bank samples of a filter-bank representation of an audio signal, wherein the current filter-bank samples correspond to a current time slot, and wherein each current filter-bank sample corresponds to a respective channel of the filter-bank, including: a hierarchy of a plurality of neural network processing tiers ordered from a top processing tier to a bottom processing tier, wherein each processing tier has been trained to generate conditioning information based on previous filter-bank samples of the filter-bank representation and, for at least each processing tier but the top tier, also on the conditioning information generated by a processing tier higher up in the hierarchy, and an output stage that has been trained to generate a probability distribution for said plurality of current filter-bank samples based on previous filter-bank samples corresponding to one or more previous time slots for the filter-bank representation and the conditioning information generated from the lowest processing tier, said output stage being configured to sample said probability distribution to obtain said plurality of current filter bank samples, wherein each processing tier includes convolutional modules configured for receiving the previous filter-bank samples of the filter-bank representation, wherein each convolutional module has a same number of input channels as a number of channels of the filter-bank, and wherein kernel sizes of the convolutional modules decrease from the top processing tier to the bottom processing tier in the hierarchy. 15 . A method for autoregressively generating a plurality of current filter-bank samples of a filter-bank representation of an audio signal, wherein the current filter-bank samples correspond to a current time slot, and wherein each current filter-bank sample corresponds to a respective channel of the filter-bank, including generating and sampling a probability distribution by using the system of any one of the preceding claims . 16 . The method of claim 15 , comprising the steps of: using the plurality of neural network processing tiers to generate conditioning information, wherein the conditioning informa

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Generative networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12579991B2 cover?
A neural network system is provided, implementing a generative model for autoregressively generating a distribution for a plurality of current filter-bank samples of an audio signal, wherein the current samples correspond to a current time slot, and each current sample corresponds to a channel of the filter-bank. The system includes a hierarchy of a plurality of neural network processing tiers …
Who is the assignee on this patent?
Dolby Int Ab
What technology area does this patent fall under?
Primary CPC classification G10L21/0208. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).