Method for generating action according to audio signal and electronic device
US-2021343058-A1 · Nov 4, 2021 · US
US11562761B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11562761-B2 |
| Application number | US-202016945364-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 31, 2020 |
| Priority date | Jul 31, 2020 |
| Publication date | Jan 24, 2023 |
| Grant date | Jan 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Dynamic adjustment of audio characteristics for enhancing musical sound during a networked conference is disclosed. In an embodiment, a method is provided for sound enhancement performed by a device coupled to a network. The method includes receiving an audio signal to be transmitted over the network, detecting when musical content is present in the audio signal, processing the audio signal to enhance voice characteristics to generate an enhanced audio signal when the musical content is not detected, processing the audio signal to enhance music characteristic to generate the enhanced audio signal when the musical content is detected, and transmitting the enhanced audio signal over the network.
Opening claim text (preview).
What is claimed is: 1. A method for sound enhancement performed by a device coupled to a network, the method comprising: receiving an audio signal associated with a current virtual online meeting to be transmitted over the network; detecting whether voice content is present in a first portion of the audio signal; in response to detecting voice content present in the first portion of the audio signal, setting a state flag as representing a first state, the state flag corresponding to a hysteresis wait time interval; initiating an instance of the hysteresis wait time interval responsive to setting the state flag to the first state; upon expiration of the instance of the hysteresis wait time interval that corresponds to the set first state, processing the first portion of the audio signal to enhance voice characteristics of the first portion of audio signal by generating a voice enhanced audio signal; detecting whether musical content is present in a second portion of the audio signal, by: (i) processing the second portion of the audio signal and one or more historical audio segments; (ii) extracting input audio features from the second portion of the audio signal and the one or more historical audio segments, the input audio features corresponding to a neural network; (iii) generating a probability indicator, via feeding the input audio features (“audio features”) into the neural network, that indicates a probability that the second portion of the audio signal includes presence of musical content; in response to detecting musical content present in the second portion of the audio signal, setting the state flag as representing a second state; initiating an instance of the hysteresis wait time interval responsive to setting the state flag to the second state; upon expiration of the instance of the hysteresis wait time interval that corresponds to the set second state, enhancing one or more music characteristics of the second portion of the audio signal by generating a music enhanced audio signal; and transmitting the voice enhanced audio signal and the music enhanced audio signal to the current virtual online meeting over the network at respective different moments during the current virtual online meeting. 2. The method of claim 1 , wherein the operation of receiving comprises receiving the audio signal from a microphone. 3. The method of claim 1 , wherein the operation of processing the audio signal to enhance the music characteristics comprises retrieving music parameters that identify processing for the audio signal. 4. The method of claim 3 , wherein the operation of processing the audio signal to enhance the music characteristics comprises performing at least one of DC removal, noise suppression, echo cancellation, gain control, and encoding on the audio signal based on the music parameters. 5. The method of claim 1 , wherein the operation of processing the audio signal to enhance the voice characteristics comprises: retrieving voice parameters; and performing at least one of DC removal, noise suppression, echo cancellation, gain control, and encoding on the audio signal based on the voice parameters. 6. Apparatus for sound enhancement, the apparatus comprising: a detector that: (i) receives an audio signal associated with a current virtual online meeting to be transmitted over the network; (ii) detects whether voice content is present in a first portion of the audio signal; (iii) sets a state flag as representing a first state upon detection of the voice content, the state flag corresponding to a hysteresis wait time interval; (iv) initiates an instance of the hysteresis wait time interval responsive to setting the state flag to the first state; (v) detects whether musical content is present in a second portion of the audio signal by: (a) processing the second portion of the audio signal and one or more historical audio segments captured prior to initiation of the current virtual online meeting; (b) extracting input audio features from the second portion of the audio signal and the one or more historical audio segments, the input audio features corresponding to a neural network; (c) generating a probability indicator, via feeding the input audio features (“audio features”) into the neural network, that indicates a probability that the second portion of the audio signal includes presence of musical content; (vi) sets the state flag as representing a second state upon detection of the music content, the state flag corresponding to a hysteresis wait time interval; and (vii) initiates an instance of the hysteresis wait time interval responsive to setting the state flag to the second state; a processor that: (i) upon expiration of the instance of the hysteresis wait time interval that corresponds to the set first state, processes the first portion of the audio signal, in response to the detector detecting voice content present in the first portion of the audio signal, to enhance voice characteristics of the first portion of audio signal by generating a voice enhanced audio signal; and (ii) upon expiration of the instance of the hysteresis wait time interval that corresponds to the set first state, enhances one or more music characteristics of the second portion of the audio signal, in response to the detector detecting musical content present in the second portion of the audio signal, by generating a music enhanced audio signal; and a transmitter that transmits the voice enhanced audio signal and the music enhanced audio signal to the current virtual online meeting over the network at respective different moments during the current virtual online meeting. 7. The apparatus of claim 6 , wherein the detector receives the audio signal from a microphone. 8. The apparatus of claim 6 , wherein the processor processes the audio signal to enhance the music characteristics by: performing at least one of DC removal, noise suppression, echo cancellation, and gain control on the audio signal based on music parameters; and performing audio encoding based on the music parameters. 9. The apparatus of claim 6 , wherein the processor processes the audio signal to enhance the voice characteristics by: performing at least one of DC removal, noise suppression, echo cancellation, and gain control on the audio signal based on voice parameters; and performing audio encoding based on the voice parameters. 10. A non-transitory computer readable medium on which are stored program instructions that, when executed by a processor, cause the processor to perform operations of: receiving an audio signal associated with a current virtual online meeting to be transmitted over the network; detecting whether voice content is present in a first portion of the audio signal; in response to detecting voice content present in the first portion of the audio signal, setting a state flag as representing a first state, the state flag corresponding to a hysteresis wait time interval; initiating an instance of the hysteresis wait time interval responsive to setting the state flag to the first state; upon expiration of the instance of the hysteresis wait time interval that corresponds to the set first state, processing the first portion of the audio signal to enhance voice characteristics of the first portion of audio signal by generating a voice enhanced audio signal; detecting whether musical content is present in a second portion of the audio signal, by: (i) processing the second portion of the audio signal and one or more historical audio segments; (ii) extracting input audio features from the second portion of the audio signal and the one or more historical audio segments, the input audio features correspo
for discriminating voice from music · CPC title
for comparison or discrimination · CPC title
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
using neural networks · CPC title
audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.