Beatboxing transcription

US12437735B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437735-B2
Application numberUS-202217688382-A
CountryUS
Kind codeB2
Filing dateMar 7, 2022
Priority dateMar 7, 2022
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating a beatbox transcript, the method comprising: receiving an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generating a spectrogram of the audio signal; generating a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal, further comprising processing the spectrogram of the audio signal with a neural network model trained on training samples that include sample beatbox sounds to generate the beatbox sound activation map including the plurality of activation times; decoding the beatbox sound activation map into a beatbox transcript; and providing the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 2. The method of claim 1 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 3. The method of claim 1 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 4. The method of claim 1 , further comprising receiving the audio signal directly from a microphone. 5. The method of claim 1 , wherein the beatbox transcript includes one or more of a hi-hat, snare, or kick. 6. The method of claim 1 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples. 7. The method of claim 6 further comprising generating the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip. 8. A system, comprising: one or more hardware processors configured by machine-readable instructions to: receive an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generate a spectrogram of the audio signal; generate a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal by processing the spectrogram of the audio signal with a neural network model trained on training samples including sample beatbox sounds; decode the beatbox sound activation map into a beatbox transcript; and provide the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 9. The system of claim 8 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 10. The system of claim 8 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 11. The system of claim 8 , wherein the one or more hardware processors are further configured by machine-readable instructions to receive the audio signal directly from a microphone. 12. The system of claim 8 , wherein the beatbox transcript includes one or more of a hi-hat, snare, or kick. 13. The system of claim 8 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples. 14. The system of claim 13 , wherein the one or more hardware processors are further configured by machine-readable instructions to: generate the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip. 15. A non-transient computer-readable storage medium comprising instructions being executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to: receive an audio signal having a plurality of beatbox sounds, wherein the plurality of beatbox sounds include beatbox vocals; generate a spectrogram of the audio signal; generate a beatbox sound activation map including a plurality of activation times for the plurality of beatbox sounds based on the spectrogram of the audio signal by processing the spectrogram of the audio signal with a neural network model trained on training samples including sample beatbox sounds; decode the beatbox sound activation map into a beatbox transcript; and provide the beatbox transcript as an output, wherein the beatbox transcript includes instrumental music matching the beatbox sound activation map. 16. The computer-readable storage medium of claim 15 , wherein the neural network model includes a convolutional neural network and a recurrent neural network. 17. The computer-readable storage medium of claim 15 , wherein the beatbox transcript is provided in a Musical Instrument Digital Interface (MIDI) format. 18. The computer-readable storage medium of claim 15 , wherein the at least one memory and the computer program code are further configured to, with the processor, cause the computer-readable storage medium to receive the audio signal directly from a microphone. 19. The computer-readable storage medium of claim 15 , wherein the beatbox transcript sounds includes one or more of a hi-hat, snare, or kick. 20. The computer-readable storage medium of claim 15 , wherein the training samples include a plurality of training samples synthesized from other non-synthesized training samples; and wherein the computer-readable storage medium includes instructions configured to cause the one or more processors to generate the plurality of training samples synthesized from other non-synthesized training samples by at least one of reversing at least one of the sample beatbox sounds, combining at least one of the sample beatbox sounds with another sample beatbox sound, separating one or more of the sample beatbox sounds from a same sound clip.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format · CPC title

  • Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines · CPC title

  • Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation · CPC title

  • for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437735B2 cover?
Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a be…
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G10H1/0008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).