Deep encoder for performing audio processing

US11900902B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900902-B2
Application numberUS-202117228357-A
CountryUS
Kind codeB2
Filing dateApr 12, 2021
Priority dateApr 12, 2021
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for determining an answer to a query associated with a graphical representation of data. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence. The one or more embodiments further include analyzing, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence. The one or more embodiments further include sending the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the requested audio signal processing effect using the parameters and outputting a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method comprising: receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence; analyzing, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence, the parameters associated with the requested audio signal processing effect; sending the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the audio signal processing effect using the parameters; and outputting a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins. 2. The computer-implemented method of claim 1 , wherein the deep encoder is trained using a training system configured to: obtain training audio data, the training audio data including at least one training audio file and an associated ground truth audio file corresponding to the audio signal processing effect; configure the deep encoder for the audio signal processing effect by serially adding each audio signal processing effects plugin of the one or more audio signal processing effects plugins to the deep encoder; analyze, by the deep encoder, the training audio data to estimate values of training parameters for processing the training audio file; provide the estimated values of the training parameters and the training audio file to the one or more audio signal processing effects plugins which generate a training output; calculate, using a loss function, a loss of the training output and the ground truth audio file; and train the deep encoder based on the loss. 3. The computer-implemented method of claim 2 , wherein calculating, using the loss function, the loss of the training output of the one or more audio signal processing effects plugins and the ground truth audio file further comprises: determining a number of audio frames representing a delay between the ground truth audio file and the training output; removing the number of audio frames representing the delay; aligning the ground truth audio file and the training output; calculating a polarity; calculating a first loss value with a non-inverted polarity and a second loss value with an inverted polarity; and selecting a minimum of the first loss value and the second loss value as the loss. 4. The computer-implemented method of claim 2 , further comprising: for each frame of each training audio file in the training audio data: analyzing, by the deep encoder, a first frame of the training audio file having a first frame length; and generating a second frame of the training audio file from the first frame, the second frame having a second frame length, wherein the second frame length is shorter than the first frame length. 5. The computer-implemented method of claim 4 , further comprising: for each training audio file in the training audio data: instantiating a first audio signal processing effects plugin to generate output audio frames based on the second frame and the estimated values of the training parameters; and instantiating a second audio signal processing effects plugin and a third audio signal processing effects plugin to determine gradients of the estimated values of the training parameters based on the second frame and the estimated values of the training parameters, wherein a same state is maintained for each of the first audio signal processing effects plugin, the second audio signal processing effects plugin, and the third audio signal processing effects plugin. 6. The computer-implemented method of claim 5 , further comprising: backpropagating the determined gradients of the estimated values of the training parameters to the deep encoder and the one or more audio signal processing effects plugins. 7. The computer-implemented method of claim 6 , wherein determining the gradients of the estimated values of the training parameters comprises: approximating the gradients of the estimated values of the training parameters using a simultaneous perturbation stochastic approximation method. 8. The computer-implemented method of claim 1 , wherein the audio signal processing effect is a tube amplifier emulation audio processing effect, wherein the one or more audio signal processing effects plugins include a multiband dynamic range compressor, and wherein the parameters include a threshold, makeup gain, ratio, and knee for each frequency band, frequency splits, an input gain, and an output gain. 9. The computer-implemented method of claim 1 , wherein the audio signal processing effect is an automatic non-speech vocal sounds removal audio processing effect, wherein the one or more audio signal processing effects plugins include multiband noise gate, and wherein the parameters include a threshold, reduction gain, and ratio for each frequency band, frequency splits, an input gain, and an output gain. 10. The computer-implemented method of claim 1 , wherein the audio signal processing effect is a music mastering audio processing effect, and wherein the one or more audio signal processing effects plugins include a multiband dynamic range compressor, a graphic equalizer, and a mono limiter, and wherein the parameters include a threshold, makeup gain and ratio for each frequency band, frequency splits, and an input gain for the multiband dynamic range compressor, a gain for each frequency band and an output gain for the graphic equalizer, and a threshold for the mono limiter. 11. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to: receive an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence; analyze, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence, the parameters associated with the requested audio signal processing effect; send the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the audio signal processing effect using the parameters; and output a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins. 12. The non-transitory computer-readable storage medium of claim 11 , further comprising instructions to train the deep encoder by a training system, the training system configured to: obtain training audio data, the training audio data including at least one training audio file and an associated ground truth audio file corresponding to the audio signal processing effect; configure the deep encoder for the audio signal processing effect by serially adding each audio signal processing effects plugin of the one or more audio signal processing effects plugins to the deep encoder; analyze, by the deep encoder, the training audio data to estimate values of training parameters for processing the training audio data; provide the estimated values of the training parameters and the training audio data to the one or more audio signal processing effects plugins which generate a training output; calculate, using a loss function, a loss of the training output and the ground truth audio file; and train the deep encoder based on the loss. 13. The non-transitory computer-readable storage medium of claim 12 , wherein to calculate, using the loss function, the loss of the training

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G10H1/0008Primary

    Associated control or indicating means · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900902B2 cover?
Embodiments are disclosed for determining an answer to a query associated with a graphical representation of data. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence. The one or more embodiments further include…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G10H1/0008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).