What technology area does this patent fall under?

Primary CPC classification G10L21/0272. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Audio processing using sound source representations

US11869478B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11869478-B2
Application number	US-202217655511-A
Country	US
Kind code	B2
Filing date	Mar 18, 2022
Priority date	Mar 18, 2022
Publication date	Jan 9, 2024
Grant date	Jan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: one or more processors configured to: receive an input audio signal including a plurality of sound sources; process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least one of the plurality of sound sources of the input audio signal is included in the multiple sound sources, and wherein: based on a retain flag having a first value, the combined representation is used to retain the at least one of the plurality of sound sources; and based on the retain flag having a second value, the combined representation is used to remove the at least one of the plurality of sound sources; and provide the output audio signal to a second device. 2. The device of claim 1 , wherein the one or more processors are configured to, based on the retain flag having the first value, use the combined representation to retain the at least one of the plurality of sound sources and to remove one or more additional sound sources from the input audio signal. 3. The device of claim 2 , wherein the one or more processors are configured to, responsive to a detected condition indicating that processing of the input audio signal is to be initiated, set the retain flag to have the first value indicating that the multiple sound sources are to be retained, wherein the first value of the retain flag is based on a user input, a default configuration, a configuration input from an application, a configuration request from another device, or a combination thereof. 4. The device of claim 1 , wherein the multiple sound sources include one or more authorized users. 5. The device of claim 1 , wherein the multiple sound sources include an emergency vehicle. 6. The device of claim 1 , wherein the one or more processors are configured to, based on the retain flag having the second value, use the combined representation to remove the at least one of the plurality of sound sources and to retain one or more additional sound sources from the input audio signal. 7. The device of claim 6 , wherein the one or more processors are configured to, responsive to a detected condition indicating that processing of the input audio signal is to be initiated, set the retain flag to have the second value indicating that the multiple sound sources are to be removed, wherein the second value of the retain flag is based on a user input, a default configuration, a configuration input from an application, a configuration request from another device, or a combination thereof. 8. The device of claim 1 , wherein the multiple sound sources include traffic, wind, reverberation, channel distortion, another non-speech sound source, a person, or a combination thereof. 9. The device of claim 1 , wherein the multiple sound sources are associated with background noise in a particular environment. 10. The device of claim 9 , wherein the particular environment corresponds to an interior of a particular type of vehicle. 11. The device of claim 1 , wherein the combined representation is based on particular sounds from particular sound sources, and wherein a particular sound source is a same sound source type as one of the multiple sound sources. 12. The device of claim 1 , wherein the one or more processors are further configured to update the combined representation based on the sounds of any of the multiple sound sources. 13. The device of claim 1 , wherein the one or more processors are further configured to, based on a combination setting, generate the combined representation based on individual representations of the multiple sound sources. 14. The device of claim 13 , wherein the one or more processors are further configured to update the combination setting based on a user input, a detected condition, or both. 15. The device of claim 1 , wherein the multiple sound sources include at least a first sound source and a second sound source, wherein a first representation of the first sound source indicates a first value of a particular feature, wherein a second representation of the second sound source indicates a second value of the particular feature, and wherein a value of the particular feature indicated by the combined representation is based on the first value and the second value. 16. The device of claim 15 , wherein the first representation includes one or more spectrograms that are based on sounds from a particular sound source that is of the same type as the first sound source. 17. The device of claim 15 , wherein the combined representation corresponds to a concatenation of a first representation of the first sound source with a second representation of the second sound source. 18. The device of claim 1 , wherein the one or more processors are configured to process the input audio signal using a neural network to generate the output audio signal. 19. The device of claim 18 , wherein the neural network includes a convolutional neural network (CNN), an autoregressive (AR) generative network, an audio generative network (AGN), an attention network (AN), a long short-term memory (LSTM) network, or a combination thereof. 20. The device of claim 18 , further comprising a sound source encoder configured to process sounds from one or more sound sources to generate a representation of the one or more sound sources, wherein the sound source encoder and the neural network are jointly trained. 21. The device of claim 1 , further comprising a receiver configured to receive audio data representing the input audio signal. 22. The device of claim 1 , further comprising a transmitter configured to transmit audio data to the second device, the audio data based on the output audio signal. 23. A method comprising: receiving an input audio signal at a first device, the input audio signal including a plurality of sound sources; processing the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least one of the plurality of sound sources of the input audio signal is included in the multiple sound sources, and wherein: based on a retain flag having a first value, the combined representation is used to retain the at least one of the plurality of sound sources; and based on the retain flag having a second value, the combined representation is used to remove the at least one of the plurality of sound sources; and providing the output audio signal to a second device. 24. The method of claim 23 , wherein, based on the retain flag having the first value, the combined representation is used to retain the at least one of the plurality of sound sources and to remove one or more additional sound sources from the input audio signal. 25. The method of claim 23 , wherein the multiple sound sources are associated with background noise in a particular environment. 26. The method of claim 25 , wherein the particular environment corresponds to an interior of a particular type of vehicle. 27. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: receive an input audio signal at a first device, the input audio signal including a plurality of sound sources; process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal, wherein at least

Assignees

Qualcomm Inc

Inventors

Classifications

H04R3/005
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
H04R1/1083
Reduction of ambient noise (active noise reduction per se G10K11/175; protective devices for the ear, e.g. providing acoustic protection A61F11/06) · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

View patent family 85979734

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11869478B2 cover?: A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal.…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G10L21/0272. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Source-based sound quality adjustment tool

Methods of encoding and decoding speech signal using neural network model recognizing sound sources, and encoding and decoding apparatuses for performing the same

Participant-Tuned Filtering Using Deep Neural Network Dynamic Spectral Masking for Conversation Isolation and Security in Noisy Environments

Smart-safe masking and alerting system

Methods and Systems for End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Filtering sounds for conferencing applications

Frequently asked questions