What technology area does this patent fall under?

Primary CPC classification G10L21/0216. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Artificial intelligence-based audio processing method, apparatus, electronic device, computer-readable storage medium, and computer program product

US12308041B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12308041-B2
Application number	US-202217969977-A
Country	US
Kind code	B2
Filing date	Oct 20, 2022
Priority date	Dec 3, 2020
Publication date	May 20, 2025
Grant date	May 20, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An artificial intelligence-based audio processing method includes: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the target audio processing mode to the audio clip of the audio scene according to a degree of interference caused by the noise in the audio clip.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence-based audio processing method, comprising: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the target audio processing mode to the audio clip according to a degree of interference caused by the noise in the audio clip, wherein the target audio processing mode includes a noise reduction processing mode, and determining the target audio processing mode comprises: querying a correspondence between different candidate audio scene types and candidate noise reduction processing modes based on the audio scene type corresponding to the audio scene, to obtain a noise reduction processing mode corresponding to the audio scene type. 2. The method according to claim 1 , wherein determining the target audio processing mode further comprises: determining a noise type matching the audio scene type based on the audio scene type corresponding to the audio scene; and querying a correspondence between different candidate noise types and the candidate noise reduction processing modes based on the noise type matching the audio scene type, to obtain the noise reduction processing mode corresponding to the audio scene type, noise types matching different audio scene types being not exactly the same. 3. The method according to claim 1 , wherein applying the target audio processing mode comprises: determining the degree of interference caused by the noise in the audio clip; and applying the noise reduction processing mode corresponding to the audio scene type to the audio clip in response to the degree of interference being greater than an interference degree threshold. 4. The method according to claim 1 , wherein applying the target audio processing mode comprises: performing matching processing on the noise type matching the audio scene type and the noise in the audio clip; and suppressing the noise successfully matched with the noise type to obtain a suppressed audio clip, a ratio of a speech signal strength to a noise signal strength in the suppressed audio clip being lower than a signal-to-noise ratio threshold. 5. The method according to claim 1 , wherein the target audio processing mode further includes a bitrate switching processing mode, and determining the target audio processing mode further comprises: querying a correspondence between different candidate audio scene types and candidate bitrate switching processing modes based on the audio scene type corresponding to the audio scene, to obtain a bitrate switching processing mode corresponding to the audio scene type. 6. The method according to claim 1 , wherein the target audio processing mode further includes a bitrate switching processing mode, and determining the target audio processing mode further comprises: comparing the audio scene type corresponding to the audio scene with a preset audio scene type; and determining a bitrate switching processing mode associated with the preset audio scene type as a bitrate switching processing mode corresponding to the audio scene type in response to determining through the comparison that the audio scene type is the preset audio scene type. 7. The method according to claim 5 , wherein applying the target audio processing mode comprises: obtaining a communication signal strength of the audio scene; reducing an audio bitrate of the audio clip according to a first set ratio or a first set value in response to the communication signal strength of the audio scene being less than a communication signal strength threshold; and increasing the audio bitrate of the audio clip according to a second set ratio or a second set value in response to the communication signal strength of the audio scene being greater than or equal to the communication signal strength threshold. 8. The method according to claim 5 , wherein applying the target audio processing mode comprises: determining jitter information of a strength of a communication signal in the audio scene based on strengths of the communication signal that are obtained by multiple times of sampling in the audio scene; and reducing an audio bitrate of the audio clip according to a third set ratio or a third set value in response to the jitter information indicating that the communication signal is in an unstable state. 9. The method according to claim 5 , wherein applying the target audio processing mode to the audio clip comprises: reducing an audio bitrate of the audio clip according to a fourth set ratio or set value in response to a type of a communication network for transmitting the audio clip being a set type. 10. The method according to claim 1 , wherein the audio scene classification processing is implemented by a neural network model, and the neural network model learns an association between the noise included in the audio clip and the audio scene type, and performing audio scene classification processing comprises: calling the neural network model based on the audio clip to perform the audio scene classification processing to obtain an audio scene type having an association with the noise comprised in the audio clip. 11. The method according to claim 10 , wherein the neural network model includes a mapping network, a residual network, and a pooling network, and calling the neural network model comprises: performing feature extraction processing on the audio clip through the mapping network to obtain a first feature vector of the noise in the audio clip; performing mapping processing on the first feature vector through the residual network to obtain a mapping vector of the audio clip; performing feature extraction processing on the mapping vector of the audio clip through the mapping network to obtain a second feature vector of the noise in the audio clip; performing pooling processing on the second feature vector through the pooling network to obtain a pooled vector of the audio clip; and performing non-linear mapping processing on the pooled vector of the audio clip to obtain the audio scene type having an association with the noise comprised in the audio clip. 12. The method according to claim 11 , wherein the mapping network includes a plurality of cascaded mapping layers, and performing the feature extraction processing comprises: performing feature mapping processing on the audio clip through a first mapping layer in the plurality of cascaded mapping layers; outputting a mapping result of the first mapping layer to a subsequent mapping layer in the plurality of cascaded mapping layers, and continuing to perform feature mapping and mapping result outputting through the subsequent mapping layer, until an output is provided to a last mapping layer; and determining a mapping result outputted by the last mapping layer as the first feature vector of the noise in the audio clip. 13. The method according to claim 11 , wherein the residual network includes a first mapping network and a second mapping network, and performing the mapping processing comprises: performing mapping processing on the first feature vector through the first mapping network to obtain a first mapping vector of the audio clip; performing nonlinear mapping processing on the first mapping vector to obtain a non-mapping vector of the audio clip; performing mapping processing on the non-mapping vector of the audio clip through the first mapping network to obtain a second mapping vector of the audio

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06F16/65
Clustering; Classification · CPC title
G10L25/51
for comparison or discrimination · CPC title
G10L25/30
using neural networks · CPC title
G10L25/03
characterised by the type of extracted parameters · CPC title
G10L21/0216Primary
characterised by the method used for estimating noise · CPC title

Patent family

Related publications grouped by family.

View patent family 78094296

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12308041B2 cover?: An artificial intelligence-based audio processing method includes: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the tar…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L21/0216. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Content and environmentally aware environmental noise compensation

Electronic device for recognizing speech

Artificial intelligence device and operating method thereof

Optimization of network microphone devices using noise classification

Speech recognition apparatus, speech recognition method, and a recording medium

Estimation of background noise in audio signals

Apparatuses and Methods for Audio Classifying and Processing

Frequently asked questions