Machine-learned differentiable digital signal processing

US11735197B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11735197-B2
Application numberUS-202016922543-A
CountryUS
Kind codeB2
Filing dateJul 7, 2020
Priority dateJul 7, 2020
Publication dateAug 22, 2023
Grant dateAug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system for the synthesis of an output audio waveform based on an input audio waveform, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: one or more digital signal processors for processing the input audio waveform; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining one or more control inputs for controlling the one or more digital signal processors, wherein the one or more control inputs are generated from one or more latent representations of acoustic features of a reference audio source, the one or more latent representations having been generated by a machine-learned model trained by backpropagation of a loss determined by comparing a recording of the reference audio source and a synthesized recording thereof; inputting the one or more control inputs and the input audio waveform into the one or more digital signal processors; and synthesizing the output audio waveform with the one or more digital signal processors. 2. The computing system of claim 1 , wherein the recording of the reference audio source is different from the input audio waveform. 3. The computing system of claim 1 , wherein the one or more digital signal processors comprises one or more of a linear time-varying filter, a linear time-invariant filter, a finite impulse response filter, an infinite impulse response filter, an oscillator, a short-time Fourier transform, a parametric equalization processor, an effects processor, an additive synthesizer, a subtractive synthesizer, or a wavetable synthesizer. 4. The computing system of claim 1 , wherein the one or more digital signal processors comprises an additive synthesizer and a subtractive synthesizer for generating the output audio waveform. 5. The computing system of claim 4 , wherein the additive synthesizer comprises an oscillator and the subtractive synthesizer comprises a linear time-varying filter applied to a noise source. 6. The computing system of claim 4 , wherein the control inputs comprise reverberation control inputs obtained by recreating a reverberation effect of the reference audio source using a reverberation digital signal processor. 7. The computing system of claim 1 , wherein the output audio waveform comprises a speech waveform. 8. The computing system of claim 1 , wherein the machine-learned model comprises an encoder for processing the model input and a decoder for outputting the one or more control inputs. 9. The computing system of claim 1 , wherein the loss comprises a spectral loss. 10. The computing system of claim 9 , wherein the spectral loss is a multi-scale spectral loss. 11. One or more non-transitory computer-readable media that collectively store: one or more digital signal processors for processing an input audio waveform; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining one or more control inputs for controlling the one or more digital signal processors, wherein the one or more control inputs are generated from one or more latent representations of acoustic features of a reference audio source, the one or more latent representations having been generated by a machine-learned model trained by backpropagation of a loss determined by comparing a recording of the reference audio source and a synthesized recording thereof; inputting the one or more control inputs and the input audio waveform into the one or more digital signal processors; and synthesizing an output audio waveform with the one or more digital signal processors. 12. The one or more non-transitory computer-readable media of claim 11 , wherein the recording of the reference audio source is different from the input audio waveform. 13. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more digital signal processors comprises one or more of a linear time-varying filter, a linear time-invariant filter, a finite impulse response filter, an infinite impulse response filter, an oscillator, a short-time Fourier transform, a parametric equalization processor, an effects processor, an additive synthesizer, a subtractive synthesizer, or a wavetable synthesizer. 14. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more digital signal processors comprises an additive synthesizer and a subtractive synthesizer for generating the output audio waveform. 15. The one or more non-transitory computer-readable media of claim 14 , wherein the additive synthesizer comprises an oscillator and the subtractive synthesizer comprises a linear time-varying filter applied to a noise source. 16. The one or more non-transitory computer-readable media of claim 14 , wherein the control inputs comprise reverberation control inputs obtained by recreating a reverberation effect of the reference audio source using a reverberation digital signal processor. 17. The one or more non-transitory computer-readable media of claim 11 , wherein the output audio waveform comprises a speech waveform. 18. The one or more non-transitory computer-readable media of claim 11 , wherein the machine-learned model comprises an encoder for processing the model input and a decoder for outputting the one or more control inputs. 19. The one or more non-transitory computer-readable media of claim 11 , wherein the loss is a multi-scale spectral loss. 20. A method for the synthesis of an output audio waveform based on an input audio waveform, comprising: obtaining, by a computing system comprising one or more processors, one or more control inputs for controlling one or more digital signal processors, wherein the one or more control inputs are generated from one or more latent representations of acoustic features of a reference audio source, the one or more latent representations having been generated by a machine-learned model trained by backpropagation of a loss determined by comparing a recording of the reference audio source and a synthesized recording thereof; inputting, by the computing system, the one or more control inputs and the input audio waveform into the one or more digital signal processors; and synthesizing, by the computing system, the output audio waveform with the one or more digital signal processors. 21. A computing system that combines machine learning with digital signal processors, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: one or more differentiable digital signal processors configured to receive one or more control inputs and to process the one or more control inputs to generate a digital signal output, wherein each of the one or more differentiable digital signal processors is differentiable from the digital signal output to the one or more control inputs; a machine-learned model configured to receive a model input and to process the model input to generate the one or more control inputs for the one or more differentiable digital signal processors, wherein the machine-learned model has been trained by backpropagating a loss through the one or more differentiable digital signal processors; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: rec

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Generative networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11735197B2 cover?
Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the prese…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L19/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).