Differentiable wavetable synthesizer using plurality of machine learning models to reduce computational complexity of audio synthesis

US12198673B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12198673-B2
Application numberUS-202117525814-A
CountryUS
Kind codeB2
Filing dateNov 12, 2021
Priority dateNov 12, 2021
Publication dateJan 14, 2025
Grant dateJan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; inputting the features to a first machine learning model; generating a plurality of wavetables by the first machine learning model based on the features, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre; initializing another machine learning model with at least one subset of the plurality of wavetables, wherein the another machine learning model is configured to reduce a computational complexity of audio synthesis; and generating an audio item based on data output from the another machine learning model. 2. The method of claim 1 , further comprising: producing an audio item based at least in part on at least one subset of the plurality of wavetables. 3. The method of claim 2 , wherein the another machine learning model comprises a third machine learning model, and wherein the method further comprises: training the third machine learning model on a short piece of new audio item, wherein the third machine learning model is initialized with the plurality of wavetables. 4. The method of claim 3 , further comprising: producing the audio item using the third machine learning model, wherein the third machine learning model outputs only time-varying attention weights associated with the at least one subset of the plurality of wavetables. 5. The method of claim 2 , further comprising: specifying a time-varying timbre vector; and producing the audio item based on the specified time-varying timbre vector and the at least one subset of the plurality of wavetables. 6. The method of claim 2 , wherein the another machine learning model comprises a second machine learning model, and wherein the method further comprises: producing the audio item using the second machine learning model, wherein the second machine learning model is initialized with the at least one subset of the plurality of wavetables, and wherein the second machine learning model outputs only data indicative of a linear combination of the at least one subset of the plurality of wavetables. 7. The method of claim 1 , wherein the first machine learning model outputs the plurality of wavetables, linear attentions and amplitudes of the plurality of wavetables. 8. The method of claim 1 , wherein the plurality of wavetables enable to reduce a number of control dimensions of audio synthesis. 9. A method, comprising: obtaining at least one subset of a plurality of wavetables, wherein each of the plurality of wavetables comprises a waveform associated with a unique timbre, wherein the plurality of wavetables are generated by a first machine learning model based on input features, and wherein the input features comprise at least timbre embedding extracted from a dataset of sounds; initializing another machine learning model with the at least one subset of the plurality of wavetables, wherein the another machine learning model is configured to reduce a computational complexity of audio synthesis; and producing an audio item based on data output from the another machine learning model. 10. A system, comprising: at least one processor; and at least one memory communicatively coupled to the at least one processor and storing instructions that upon execution by the at least one processor cause the system to perform operations, the operations comprising: extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to a first machine learning model; generating a plurality of wavetables by the first machine learning model based on the features, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre; initializing another machine learning model with at least one subset of the plurality of wavetables, wherein the another machine learning model is configured to reduce a computational complexity of audio synthesis; and generating an audio item based on data output from the another machine learning model. 11. The system of claim 10 , the operations further comprising: producing an audio item based at least in part on at least one subset of the plurality of wavetables. 12. The system of claim 11 , wherein the another machine learning model comprises a third machine learning model, and wherein the operations further comprise: training the third machine learning model on a short piece of new audio item, wherein the third machine learning model is initialized with the plurality of wavetables. 13. The system of claim 12 , the operations further comprising: producing the audio item using the third machine learning model, wherein the third machine learning model outputs only time-varying attention weights associated with the at least one subset of the plurality of wavetables. 14. The system of claim 11 , the operations further comprising: specifying a time-varying timbre vector; and producing the audio item based on the specified a time-varying timbre vector and the at least one subset of the plurality of wavetables. 15. The system of claim 11 , wherein the another machine learning model comprises a second machine learning model, and wherein the operations further comprise: producing the audio item using the second machine learning model, wherein the second machine learning model is initialized with the at least one subset of the plurality of wavetables, and wherein outputs only data indicative of a linear combination of the at least one subset of the plurality of wavetables. 16. The system of claim 11 , wherein the plurality of wavetables enable to reduce a number of control dimensions of audio synthesis. 17. A non-transitory computer-readable storage medium, storing computer-readable instructions that upon execution by a processor cause the processor to implement operations, the operation comprising: extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to a first machine learning model; generating a plurality of wavetables by the first machine learning model based on the features, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre; initializing another machine learning model with at least one subset of the plurality of wavetables, wherein the another machine learning model is configured to reduce a computational complexity of audio synthesis; and generating an audio item based on data output from the another machine learning model. 18. The non-transitory computer-readable storage medium of claim 17 , the operations further comprising: producing an audio item based at least in part on at least one subset of the plurality of wavetables. 19. The non-transitory computer-readable storage medium of claim 18 , wherein the another machine learning model comprises a second machine learning model, wherein the operations further comprise: producing the audio item using the second machine learning model, wherein the second machine learning model is initialized with the at least one subset of the plurality of wavetables, and wherein the second machine learning model outputs only data indicative of a linear combination of the at least one subset of the plurality of wavetables. 20. The non-transitory computer-readable storage medium of claim 18 , wherein the another machine learning model comprises a third machine learning model, wherein the operations further co

Assignees

Inventors

Classifications

  • Segmentation; Word boundary detection · CPC title

  • Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • using Fourier coefficients · CPC title

  • Pre-filtering or post-filtering · CPC title

  • Architecture of speech synthesisers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12198673B2 cover?
The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wa…
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).