Automatic and interactive mashup system

US12451106B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12451106-B2
Application numberUS-202217737258-A
CountryUS
Kind codeB2
Filing dateMay 5, 2022
Priority dateMay 5, 2022
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for combining audio tracks, the method comprising: receiving a first audio track and a second audio track; separating the first audio track into a vocal component and one or more accompaniment components; separating the second audio track into a vocal component and one or more accompaniment components; determining a structure of the first audio track and a structure of the second audio track; aligning the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; displaying, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjusting the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between sections of the vocal component of the first audio track and sections of the one or more accompaniment components of the second audio track; stretching the vocal component of the first audio track to match a tempo of the second audio track; and generating a mixed audio by adding the stretched vocal component of the first audio track to the one or more accompaniment components of the second audio track. 2. The computer-implemented method of claim 1 , wherein determining a structure of the first audio track and of the second audio track comprises: identifying segments within the first audio track and segments within the second audio track; and identifying music theory labels for the segments within the first audio track and for the segments within the second audio track. 3. The computer-implemented method of claim 2 , wherein the first audio track and the second audio track are aligned based on the identified segments and music theory labels for the first audio track and the second audio track. 4. The computer-implemented method of claim 1 , wherein the visualization shows the alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 5. The computer-implemented method of claim 4 , further comprising: displaying, on the user interface, an updated visualization showing the changed alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 6. The computer-implemented method of claim 1 , wherein the one or more accompaniment components of the first audio track and the second audio track are one or more instrumental components. 7. The computer-implemented method of claim 1 , wherein stretching the vocal component of the first audio track to match a tempo of the second audio track comprises: detecting beat and downbeat timestamps for the first audio track and for the second audio track; estimating a tempo of the second audio track based on the detected beat and downbeat timestamps for the second audio track; and applying a stretch to the vocal component of the first audio track to match the estimated tempo of the second audio track. 8. A system for combining audio tracks, the system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations, the set of operations including: receiving a first audio track and a second audio track; separating the first audio track into a vocal component and one or more accompaniment components; separating the second audio track into a vocal component and one or more accompaniment components; determining a structure of the first audio track and a structure of the second audio track; aligning the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; displaying, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjusting the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between sections of the vocal component of the first audio track and sections of the one or more accompaniment components of the second audio track; stretching the vocal component of the first audio track to match a tempo of the second audio track; and generating a mixed audio by adding the stretched vocal component of the first audio track to the one or more accompaniment components of the second audio track. 9. The system of claim 8 , wherein the set of operations includes: identifying segments within the first audio track and segments within the second audio track; and identifying music theory labels for the segments within the first audio track and for the segments within the second audio track. 10. The system of claim 9 , wherein the first audio track and the second audio track are aligned based on the identified segments and music theory labels for the first audio track and the second audio track. 11. The system of claim 8 , wherein the visualization shows the alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 12. The system of claim 11 , wherein the set of operations includes: displaying, on the user interface, an updated visualization showing the changed alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 13. The system of claim 8 , wherein the one or more accompaniment components of the first audio track and the second audio track are one or more instrumental components. 14. The system of claim 8 , wherein the set of operations includes: detecting beat and downbeat timestamps for the first audio track and for the second audio track; estimating a tempo of the second audio track based on the detected beat and downbeat timestamps for the second audio track; and applying a stretch to the vocal component of the first audio track to match the estimated tempo of the second audio track. 15. A non-transient computer-readable storage medium comprising instructions being executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to: receive a first audio track and a second audio track; separate the first audio track into a vocal component and one or more accompaniment components; separate the second audio track into a vocal component and one or more accompaniment components; determine a structure of the first audio track and a structure of the second audio track; align the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; display, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjust the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track; stretch the vocal component of t

Assignees

Inventors

Classifications

  • for extraction of timing, tempo; Beat detection · CPC title

  • Synchronizing two or more audio tracks or files according to musical features or musical timings · CPC title

  • Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays · CPC title

  • for graphical editing of sound parameters or waveforms, e.g. by graphical interactive control of timbre, partials or envelope · CPC title

  • Automatic tempo adjustment, correction or control · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12451106B2 cover?
Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a struct…
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G10H1/0008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).