Audio processing using sound source representations

US11869478B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11869478-B2
Application numberUS-202217655511-A
CountryUS
Kind codeB2
Filing dateMar 18, 2022
Priority dateMar 18, 2022
Publication dateJan 9, 2024
Grant dateJan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: one or more processors configured to: receive an input audio signal including a plurality of sound sources; process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least one of the plurality of sound sources of the input audio signal is included in the multiple sound sources, and wherein: based on a retain flag having a first value, the combined representation is used to retain the at least one of the plurality of sound sources; and based on the retain flag having a second value, the combined representation is used to remove the at least one of the plurality of sound sources; and provide the output audio signal to a second device. 2. The device of claim 1 , wherein the one or more processors are configured to, based on the retain flag having the first value, use the combined representation to retain the at least one of the plurality of sound sources and to remove one or more additional sound sources from the input audio signal. 3. The device of claim 2 , wherein the one or more processors are configured to, responsive to a detected condition indicating that processing of the input audio signal is to be initiated, set the retain flag to have the first value indicating that the multiple sound sources are to be retained, wherein the first value of the retain flag is based on a user input, a default configuration, a configuration input from an application, a configuration request from another device, or a combination thereof. 4. The device of claim 1 , wherein the multiple sound sources include one or more authorized users. 5. The device of claim 1 , wherein the multiple sound sources include an emergency vehicle. 6. The device of claim 1 , wherein the one or more processors are configured to, based on the retain flag having the second value, use the combined representation to remove the at least one of the plurality of sound sources and to retain one or more additional sound sources from the input audio signal. 7. The device of claim 6 , wherein the one or more processors are configured to, responsive to a detected condition indicating that processing of the input audio signal is to be initiated, set the retain flag to have the second value indicating that the multiple sound sources are to be removed, wherein the second value of the retain flag is based on a user input, a default configuration, a configuration input from an application, a configuration request from another device, or a combination thereof. 8. The device of claim 1 , wherein the multiple sound sources include traffic, wind, reverberation, channel distortion, another non-speech sound source, a person, or a combination thereof. 9. The device of claim 1 , wherein the multiple sound sources are associated with background noise in a particular environment. 10. The device of claim 9 , wherein the particular environment corresponds to an interior of a particular type of vehicle. 11. The device of claim 1 , wherein the combined representation is based on particular sounds from particular sound sources, and wherein a particular sound source is a same sound source type as one of the multiple sound sources. 12. The device of claim 1 , wherein the one or more processors are further configured to update the combined representation based on the sounds of any of the multiple sound sources. 13. The device of claim 1 , wherein the one or more processors are further configured to, based on a combination setting, generate the combined representation based on individual representations of the multiple sound sources. 14. The device of claim 13 , wherein the one or more processors are further configured to update the combination setting based on a user input, a detected condition, or both. 15. The device of claim 1 , wherein the multiple sound sources include at least a first sound source and a second sound source, wherein a first representation of the first sound source indicates a first value of a particular feature, wherein a second representation of the second sound source indicates a second value of the particular feature, and wherein a value of the particular feature indicated by the combined representation is based on the first value and the second value. 16. The device of claim 15 , wherein the first representation includes one or more spectrograms that are based on sounds from a particular sound source that is of the same type as the first sound source. 17. The device of claim 15 , wherein the combined representation corresponds to a concatenation of a first representation of the first sound source with a second representation of the second sound source. 18. The device of claim 1 , wherein the one or more processors are configured to process the input audio signal using a neural network to generate the output audio signal. 19. The device of claim 18 , wherein the neural network includes a convolutional neural network (CNN), an autoregressive (AR) generative network, an audio generative network (AGN), an attention network (AN), a long short-term memory (LSTM) network, or a combination thereof. 20. The device of claim 18 , further comprising a sound source encoder configured to process sounds from one or more sound sources to generate a representation of the one or more sound sources, wherein the sound source encoder and the neural network are jointly trained. 21. The device of claim 1 , further comprising a receiver configured to receive audio data representing the input audio signal. 22. The device of claim 1 , further comprising a transmitter configured to transmit audio data to the second device, the audio data based on the output audio signal. 23. A method comprising: receiving an input audio signal at a first device, the input audio signal including a plurality of sound sources; processing the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least one of the plurality of sound sources of the input audio signal is included in the multiple sound sources, and wherein: based on a retain flag having a first value, the combined representation is used to retain the at least one of the plurality of sound sources; and based on the retain flag having a second value, the combined representation is used to remove the at least one of the plurality of sound sources; and providing the output audio signal to a second device. 24. The method of claim 23 , wherein, based on the retain flag having the first value, the combined representation is used to retain the at least one of the plurality of sound sources and to remove one or more additional sound sources from the input audio signal. 25. The method of claim 23 , wherein the multiple sound sources are associated with background noise in a particular environment. 26. The method of claim 25 , wherein the particular environment corresponds to an interior of a particular type of vehicle. 27. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: receive an input audio signal at a first device, the input audio signal including a plurality of sound sources; process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least

Assignees

Inventors

Classifications

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Reduction of ambient noise (active noise reduction per se G10K11/175; protective devices for the ear, e.g. providing acoustic protection A61F11/06) · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11869478B2 cover?
A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal.…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/0272. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).