Automatic and interactive mashup system
US-2023360618-A1 · Nov 9, 2023 · US
US12451106B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12451106-B2 |
| Application number | US-202217737258-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 5, 2022 |
| Priority date | May 5, 2022 |
| Publication date | Oct 21, 2025 |
| Grant date | Oct 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for combining audio tracks, the method comprising: receiving a first audio track and a second audio track; separating the first audio track into a vocal component and one or more accompaniment components; separating the second audio track into a vocal component and one or more accompaniment components; determining a structure of the first audio track and a structure of the second audio track; aligning the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; displaying, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjusting the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between sections of the vocal component of the first audio track and sections of the one or more accompaniment components of the second audio track; stretching the vocal component of the first audio track to match a tempo of the second audio track; and generating a mixed audio by adding the stretched vocal component of the first audio track to the one or more accompaniment components of the second audio track. 2. The computer-implemented method of claim 1 , wherein determining a structure of the first audio track and of the second audio track comprises: identifying segments within the first audio track and segments within the second audio track; and identifying music theory labels for the segments within the first audio track and for the segments within the second audio track. 3. The computer-implemented method of claim 2 , wherein the first audio track and the second audio track are aligned based on the identified segments and music theory labels for the first audio track and the second audio track. 4. The computer-implemented method of claim 1 , wherein the visualization shows the alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 5. The computer-implemented method of claim 4 , further comprising: displaying, on the user interface, an updated visualization showing the changed alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 6. The computer-implemented method of claim 1 , wherein the one or more accompaniment components of the first audio track and the second audio track are one or more instrumental components. 7. The computer-implemented method of claim 1 , wherein stretching the vocal component of the first audio track to match a tempo of the second audio track comprises: detecting beat and downbeat timestamps for the first audio track and for the second audio track; estimating a tempo of the second audio track based on the detected beat and downbeat timestamps for the second audio track; and applying a stretch to the vocal component of the first audio track to match the estimated tempo of the second audio track. 8. A system for combining audio tracks, the system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations, the set of operations including: receiving a first audio track and a second audio track; separating the first audio track into a vocal component and one or more accompaniment components; separating the second audio track into a vocal component and one or more accompaniment components; determining a structure of the first audio track and a structure of the second audio track; aligning the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; displaying, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjusting the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between sections of the vocal component of the first audio track and sections of the one or more accompaniment components of the second audio track; stretching the vocal component of the first audio track to match a tempo of the second audio track; and generating a mixed audio by adding the stretched vocal component of the first audio track to the one or more accompaniment components of the second audio track. 9. The system of claim 8 , wherein the set of operations includes: identifying segments within the first audio track and segments within the second audio track; and identifying music theory labels for the segments within the first audio track and for the segments within the second audio track. 10. The system of claim 9 , wherein the first audio track and the second audio track are aligned based on the identified segments and music theory labels for the first audio track and the second audio track. 11. The system of claim 8 , wherein the visualization shows the alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 12. The system of claim 11 , wherein the set of operations includes: displaying, on the user interface, an updated visualization showing the changed alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track. 13. The system of claim 8 , wherein the one or more accompaniment components of the first audio track and the second audio track are one or more instrumental components. 14. The system of claim 8 , wherein the set of operations includes: detecting beat and downbeat timestamps for the first audio track and for the second audio track; estimating a tempo of the second audio track based on the detected beat and downbeat timestamps for the second audio track; and applying a stretch to the vocal component of the first audio track to match the estimated tempo of the second audio track. 15. A non-transient computer-readable storage medium comprising instructions being executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to: receive a first audio track and a second audio track; separate the first audio track into a vocal component and one or more accompaniment components; separate the second audio track into a vocal component and one or more accompaniment components; determine a structure of the first audio track and a structure of the second audio track; align the vocal component of the first audio track and one of the one or more accompaniment components of the second audio track based on the determined structures of the tracks; display, on a user interface, a visualization of the vocal component of the first audio track and the one or more accompaniment components of the second audio track at a first alignment; adjust the first alignment upon receiving, via the user interface, a user input corresponding to a change in an alignment between the sections of the vocal component of the first audio track and the sections of the one or more accompaniment components of the second audio track; stretch the vocal component of t
for extraction of timing, tempo; Beat detection · CPC title
Synchronizing two or more audio tracks or files according to musical features or musical timings · CPC title
Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays · CPC title
for graphical editing of sound parameters or waveforms, e.g. by graphical interactive control of timbre, partials or envelope · CPC title
Automatic tempo adjustment, correction or control · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.