Approach to automatic music remix based on style templates
US-2023360619-A1 · Nov 9, 2023 · US
US12437735B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12437735-B2 |
| Application number | US-202217688382-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 7, 2022 |
| Priority date | Mar 7, 2022 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.
Opening claim text (preview).
What is claimed is: 1. A method for generating a beatbox transcript, the method comprising: receiving an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generating a spectrogram of the audio signal; generating a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal, further comprising processing the spectrogram of the audio signal with a neural network model trained on training samples that include sample beatbox sounds to generate the beatbox sound activation map including the plurality of activation times; decoding the beatbox sound activation map into a beatbox transcript; and providing the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 2. The method of claim 1 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 3. The method of claim 1 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 4. The method of claim 1 , further comprising receiving the audio signal directly from a microphone. 5. The method of claim 1 , wherein the beatbox transcript includes one or more of a hi-hat, snare, or kick. 6. The method of claim 1 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples. 7. The method of claim 6 further comprising generating the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip. 8. A system, comprising: one or more hardware processors configured by machine-readable instructions to: receive an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generate a spectrogram of the audio signal; generate a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal by processing the spectrogram of the audio signal with a neural network model trained on training samples including sample beatbox sounds; decode the beatbox sound activation map into a beatbox transcript; and provide the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 9. The system of claim 8 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 10. The system of claim 8 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 11. The system of claim 8 , wherein the one or more hardware processors are further configured by machine-readable instructions to receive the audio signal directly from a microphone. 12. The system of claim 8 , wherein the beatbox transcript includes one or more of a hi-hat, snare, or kick. 13. The system of claim 8 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples. 14. The system of claim 13 , wherein the one or more hardware processors are further configured by machine-readable instructions to: generate the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip. 15. A non-transient computer-readable storage medium comprising instructions being executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to: receive an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generate a spectrogram of the audio signal; generate a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal by processing the spectrogram of the audio signal with a neural network model trained on training samples including sample beatbox sounds; decode the beatbox sound activation map into a beatbox transcript; and provide the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 16. The computer-readable storage medium of claim 15 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 17. The computer-readable storage medium of claim 15 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 18. The computer-readable storage medium of claim 15 , wherein the at least one memory and the computer program code are further configured to, with the processor, cause the computer-readable storage medium to receive the audio signal directly from a microphone. 19. The computer-readable storage medium of claim 15 , wherein the beatbox transcript sounds includes one or more of a hi-hat, snare, or kick. 20. The computer-readable storage medium of claim 15 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples; and wherein the computer-readable storage medium includes instructions configured to cause the one or more processors to generate the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip.
using neural networks · CPC title
for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format · CPC title
Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines · CPC title
Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation · CPC title
for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.