Method for separating target sound source from mixed sound source and electronic device thereof

US12424241B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12424241-B2
Application numberUS-202318235664-A
CountryUS
Kind codeB2
Filing dateAug 18, 2023
Priority dateAug 18, 2022
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for separating a target sound source, includes: obtaining a mixed sound source including at least one sound source; obtaining, based on the mixed sound source, scene information related to the mixed sound source; converting, based on the scene information, a first embedding vector corresponding to a designated sound source group into a second embedding vector; and separating, based on the mixed sound source and the second embedding vector, the target sound source from the mixed sound source.

First claim

Opening claim text (preview).

What is claimed: 1. A method for separating a target sound source, the method comprising: obtaining a mixed sound source including at least one sound source; obtaining, based on the mixed sound source, scene information related to the mixed sound source; converting, based on the scene information, a first embedding vector corresponding to a designated sound source group into a second embedding vector; and separating, based on the mixed sound source and the second embedding vector, the target sound source from the mixed sound source. 2. The method of claim 1 , further comprising: performing a first pre-training process to learn a scene information vector based on obtaining an input sound source, wherein the obtaining the scene information comprises outputting the scene information vector based on the mixed sound source. 3. The method of claim 2 , wherein the performing the first pre-training process comprises outputting the scene information vector based on the obtaining of the input sound source and learning to classify a designated scene based on the output scene information vector. 4. The method of claim 1 , wherein the obtaining the scene information comprises generating an instance vector corresponding to the scene information based on the mixed sound source. 5. The method of claim 1 , wherein the first embedding vector corresponds to an entirety of the designated sound source group, and wherein the converting the first embedding vector into the second embedding vector comprises converting the first embedding vector into the second embedding vector corresponding to at least a portion of the designated sound source group based on the obtained scene information. 6. The method of claim 1 , further comprising: performing a second pre-training process to learn partial embedding vectors, each of the partial embedding vectors corresponding to a respective sound source included in the designated sound source group, wherein the first embedding vector is a sum vector of the partial embedding vectors. 7. The method of claim 6 , wherein the converting the first embedding vector into the second embedding vector comprises: identifying the target sound source included in the designated sound source group corresponding to the scene information; and converting the first embedding vector into the second embedding vector to correspond to the target sound source. 8. The method of claim 1 , wherein the designated sound source group includes at least one target sound source designated to correspond to the scene information. 9. The method of claim 1 , further comprising performing a third pre-training process, based on an embedding vector between specific scene information and a specific target sound source corresponding to the specific scene information, to learn the conversion from the first embedding vector into the second embedding vector. 10. The method of claim 1 , wherein the separating the target sound source comprises generating the target sound source having a vector form with the same size as the mixed sound source by applying the second embedding vector to the mixed sound source. 11. An electronic device comprising: an input interface; a memory storing at least one instruction; and at least one processor operatively connected with the input interface and the memory, wherein the at least one processor is configured to execute the at least one instruction to: obtain, from the input interface, a mixed sound source including at least one sound source, obtain, based on the mixed sound source, scene information related to the mixed sound source convert, based on the scene information, a first embedding vector corresponding to a designated sound source group into a second embedding vector, and separate, based on the mixed sound source and the second embedding vector, the target sound source from the mixed sound source. 12. The electronic device of claim 11 , wherein the at least one processor is further configured to execute the at least one instruction to: perform a first pre-training process to learn a scene information vector based on obtaining an input sound source, and as at least part of the obtaining the scene information output the scene information vector based on the mixed sound source. 13. The electronic device of claim 12 , wherein the at least one processor is further configured to: as at least part of the performing the first pre-training process, output the scene information vector based on the obtaining of the input sound source and learn to classify a designated scene based on the output scene information vector. 14. The electronic device of claim 11 , wherein the at least one processor is further configured to execute the at least one instruction to: as at least part of the obtaining the scene information, generate an instance vector corresponding to the scene information based on the mixed sound source. 15. The electronic device of claim 11 , wherein the first embedding vector corresponds to an entirety of the designated sound source group, and wherein the at least one processor is further configured to: as at least part of the converting the first embedding vector into the second embedding vector, convert the first embedding vector into the second embedding vector corresponding to at least a portion of the designated sound source group based on the scene information. 16. The electronic device of claim 11 , wherein the at least one processor is further configured to execute the at least one instruction to perform a second pre-training process to learn partial embedding vectors, each of the partial embedding vectors corresponding to a respective sound source included in the designated sound source group, and wherein the first embedding vector is a sum vector of the partial embedding vectors. 17. The electronic device of claim 16 , wherein the at least one processor is further configured to execute the at least one instruction to: as at least part of the converting the first embedding vector into the second embedding vector, identify the target sound source included in the designated sound source group corresponding to the scene information and convert the first embedding vector into the second embedding vector to correspond to the target sound source. 18. The electronic device of claim 11 , wherein the designated sound source group includes at least one target sound source designated to correspond to the scene information. 19. The electronic device of claim 11 , wherein the at least one processor is further configured to execute the at least one instruction to perform a third pre-training process, based on an embedding vector between specific scene information and a specific target sound source corresponding to the specific scene information, to learn the conversion from the first embedding vector into the second embedding vector. 20. The electronic device of claim 11 , wherein the at least one processor is further configured to execute the at least one instruction to: as at least part of the separating the target sound source, generate the target sound source having a vector form with the same size as the mixed sound source by applying the second embedding vector to the mixed sound source.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12424241B2 cover?
A method for separating a target sound source, includes: obtaining a mixed sound source including at least one sound source; obtaining, based on the mixed sound source, scene information related to the mixed sound source; converting, based on the scene information, a first embedding vector corresponding to a designated sound source group into a second embedding vector; and separating, based on …
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Univ Hanyang Ind Univ Coop Found
What technology area does this patent fall under?
Primary CPC classification G10L25/81. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).