Method and apparatus for controlling enhancement of low-bitrate coded audio
US-2021327445-A1 · Oct 21, 2021 · US
US11900902B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11900902-B2 |
| Application number | US-202117228357-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 12, 2021 |
| Priority date | Apr 12, 2021 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are disclosed for determining an answer to a query associated with a graphical representation of data. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence. The one or more embodiments further include analyzing, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence. The one or more embodiments further include sending the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the requested audio signal processing effect using the parameters and outputting a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins.
Opening claim text (preview).
We claim: 1. A computer-implemented method comprising: receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence; analyzing, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence, the parameters associated with the requested audio signal processing effect; sending the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the audio signal processing effect using the parameters; and outputting a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins. 2. The computer-implemented method of claim 1 , wherein the deep encoder is trained using a training system configured to: obtain training audio data, the training audio data including at least one training audio file and an associated ground truth audio file corresponding to the audio signal processing effect; configure the deep encoder for the audio signal processing effect by serially adding each audio signal processing effects plugin of the one or more audio signal processing effects plugins to the deep encoder; analyze, by the deep encoder, the training audio data to estimate values of training parameters for processing the training audio file; provide the estimated values of the training parameters and the training audio file to the one or more audio signal processing effects plugins which generate a training output; calculate, using a loss function, a loss of the training output and the ground truth audio file; and train the deep encoder based on the loss. 3. The computer-implemented method of claim 2 , wherein calculating, using the loss function, the loss of the training output of the one or more audio signal processing effects plugins and the ground truth audio file further comprises: determining a number of audio frames representing a delay between the ground truth audio file and the training output; removing the number of audio frames representing the delay; aligning the ground truth audio file and the training output; calculating a polarity; calculating a first loss value with a non-inverted polarity and a second loss value with an inverted polarity; and selecting a minimum of the first loss value and the second loss value as the loss. 4. The computer-implemented method of claim 2 , further comprising: for each frame of each training audio file in the training audio data: analyzing, by the deep encoder, a first frame of the training audio file having a first frame length; and generating a second frame of the training audio file from the first frame, the second frame having a second frame length, wherein the second frame length is shorter than the first frame length. 5. The computer-implemented method of claim 4 , further comprising: for each training audio file in the training audio data: instantiating a first audio signal processing effects plugin to generate output audio frames based on the second frame and the estimated values of the training parameters; and instantiating a second audio signal processing effects plugin and a third audio signal processing effects plugin to determine gradients of the estimated values of the training parameters based on the second frame and the estimated values of the training parameters, wherein a same state is maintained for each of the first audio signal processing effects plugin, the second audio signal processing effects plugin, and the third audio signal processing effects plugin. 6. The computer-implemented method of claim 5 , further comprising: backpropagating the determined gradients of the estimated values of the training parameters to the deep encoder and the one or more audio signal processing effects plugins. 7. The computer-implemented method of claim 6 , wherein determining the gradients of the estimated values of the training parameters comprises: approximating the gradients of the estimated values of the training parameters using a simultaneous perturbation stochastic approximation method. 8. The computer-implemented method of claim 1 , wherein the audio signal processing effect is a tube amplifier emulation audio processing effect, wherein the one or more audio signal processing effects plugins include a multiband dynamic range compressor, and wherein the parameters include a threshold, makeup gain, ratio, and knee for each frequency band, frequency splits, an input gain, and an output gain. 9. The computer-implemented method of claim 1 , wherein the audio signal processing effect is an automatic non-speech vocal sounds removal audio processing effect, wherein the one or more audio signal processing effects plugins include multiband noise gate, and wherein the parameters include a threshold, reduction gain, and ratio for each frequency band, frequency splits, an input gain, and an output gain. 10. The computer-implemented method of claim 1 , wherein the audio signal processing effect is a music mastering audio processing effect, and wherein the one or more audio signal processing effects plugins include a multiband dynamic range compressor, a graphic equalizer, and a mono limiter, and wherein the parameters include a threshold, makeup gain and ratio for each frequency band, frequency splits, and an input gain for the multiband dynamic range compressor, a gain for each frequency band and an output gain for the graphic equalizer, and a threshold for the mono limiter. 11. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to: receive an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence; analyze, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence, the parameters associated with the requested audio signal processing effect; send the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the audio signal processing effect using the parameters; and output a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins. 12. The non-transitory computer-readable storage medium of claim 11 , further comprising instructions to train the deep encoder by a training system, the training system configured to: obtain training audio data, the training audio data including at least one training audio file and an associated ground truth audio file corresponding to the audio signal processing effect; configure the deep encoder for the audio signal processing effect by serially adding each audio signal processing effects plugin of the one or more audio signal processing effects plugins to the deep encoder; analyze, by the deep encoder, the training audio data to estimate values of training parameters for processing the training audio data; provide the estimated values of the training parameters and the training audio data to the one or more audio signal processing effects plugins which generate a training output; calculate, using a loss function, a loss of the training output and the ground truth audio file; and train the deep encoder based on the loss. 13. The non-transitory computer-readable storage medium of claim 12 , wherein to calculate, using the loss function, the loss of the training
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Associated control or indicating means · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.