Content and environmentally aware environmental noise compensation
US-2022406326-A1 · Dec 22, 2022 · US
US12308041B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12308041-B2 |
| Application number | US-202217969977-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 20, 2022 |
| Priority date | Dec 3, 2020 |
| Publication date | May 20, 2025 |
| Grant date | May 20, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An artificial intelligence-based audio processing method includes: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the target audio processing mode to the audio clip of the audio scene according to a degree of interference caused by the noise in the audio clip.
Opening claim text (preview).
What is claimed is: 1. An artificial intelligence-based audio processing method, comprising: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the target audio processing mode to the audio clip according to a degree of interference caused by the noise in the audio clip, wherein the target audio processing mode includes a noise reduction processing mode, and determining the target audio processing mode comprises: querying a correspondence between different candidate audio scene types and candidate noise reduction processing modes based on the audio scene type corresponding to the audio scene, to obtain a noise reduction processing mode corresponding to the audio scene type. 2. The method according to claim 1 , wherein determining the target audio processing mode further comprises: determining a noise type matching the audio scene type based on the audio scene type corresponding to the audio scene; and querying a correspondence between different candidate noise types and the candidate noise reduction processing modes based on the noise type matching the audio scene type, to obtain the noise reduction processing mode corresponding to the audio scene type, noise types matching different audio scene types being not exactly the same. 3. The method according to claim 1 , wherein applying the target audio processing mode comprises: determining the degree of interference caused by the noise in the audio clip; and applying the noise reduction processing mode corresponding to the audio scene type to the audio clip in response to the degree of interference being greater than an interference degree threshold. 4. The method according to claim 1 , wherein applying the target audio processing mode comprises: performing matching processing on the noise type matching the audio scene type and the noise in the audio clip; and suppressing the noise successfully matched with the noise type to obtain a suppressed audio clip, a ratio of a speech signal strength to a noise signal strength in the suppressed audio clip being lower than a signal-to-noise ratio threshold. 5. The method according to claim 1 , wherein the target audio processing mode further includes a bitrate switching processing mode, and determining the target audio processing mode further comprises: querying a correspondence between different candidate audio scene types and candidate bitrate switching processing modes based on the audio scene type corresponding to the audio scene, to obtain a bitrate switching processing mode corresponding to the audio scene type. 6. The method according to claim 1 , wherein the target audio processing mode further includes a bitrate switching processing mode, and determining the target audio processing mode further comprises: comparing the audio scene type corresponding to the audio scene with a preset audio scene type; and determining a bitrate switching processing mode associated with the preset audio scene type as a bitrate switching processing mode corresponding to the audio scene type in response to determining through the comparison that the audio scene type is the preset audio scene type. 7. The method according to claim 5 , wherein applying the target audio processing mode comprises: obtaining a communication signal strength of the audio scene; reducing an audio bitrate of the audio clip according to a first set ratio or a first set value in response to the communication signal strength of the audio scene being less than a communication signal strength threshold; and increasing the audio bitrate of the audio clip according to a second set ratio or a second set value in response to the communication signal strength of the audio scene being greater than or equal to the communication signal strength threshold. 8. The method according to claim 5 , wherein applying the target audio processing mode comprises: determining jitter information of a strength of a communication signal in the audio scene based on strengths of the communication signal that are obtained by multiple times of sampling in the audio scene; and reducing an audio bitrate of the audio clip according to a third set ratio or a third set value in response to the jitter information indicating that the communication signal is in an unstable state. 9. The method according to claim 5 , wherein applying the target audio processing mode to the audio clip comprises: reducing an audio bitrate of the audio clip according to a fourth set ratio or set value in response to a type of a communication network for transmitting the audio clip being a set type. 10. The method according to claim 1 , wherein the audio scene classification processing is implemented by a neural network model, and the neural network model learns an association between the noise included in the audio clip and the audio scene type, and performing audio scene classification processing comprises: calling the neural network model based on the audio clip to perform the audio scene classification processing to obtain an audio scene type having an association with the noise comprised in the audio clip. 11. The method according to claim 10 , wherein the neural network model includes a mapping network, a residual network, and a pooling network, and calling the neural network model comprises: performing feature extraction processing on the audio clip through the mapping network to obtain a first feature vector of the noise in the audio clip; performing mapping processing on the first feature vector through the residual network to obtain a mapping vector of the audio clip; performing feature extraction processing on the mapping vector of the audio clip through the mapping network to obtain a second feature vector of the noise in the audio clip; performing pooling processing on the second feature vector through the pooling network to obtain a pooled vector of the audio clip; and performing non-linear mapping processing on the pooled vector of the audio clip to obtain the audio scene type having an association with the noise comprised in the audio clip. 12. The method according to claim 11 , wherein the mapping network includes a plurality of cascaded mapping layers, and performing the feature extraction processing comprises: performing feature mapping processing on the audio clip through a first mapping layer in the plurality of cascaded mapping layers; outputting a mapping result of the first mapping layer to a subsequent mapping layer in the plurality of cascaded mapping layers, and continuing to perform feature mapping and mapping result outputting through the subsequent mapping layer, until an output is provided to a last mapping layer; and determining a mapping result outputted by the last mapping layer as the first feature vector of the noise in the audio clip. 13. The method according to claim 11 , wherein the residual network includes a first mapping network and a second mapping network, and performing the mapping processing comprises: performing mapping processing on the first feature vector through the first mapping network to obtain a first mapping vector of the audio clip; performing nonlinear mapping processing on the first mapping vector to obtain a non-mapping vector of the audio clip; performing mapping processing on the non-mapping vector of the audio clip through the first mapping network to obtain a second mapping vector of the audio
Clustering; Classification · CPC title
for comparison or discrimination · CPC title
using neural networks · CPC title
characterised by the type of extracted parameters · CPC title
characterised by the method used for estimating noise · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.