System and method for performing speech enhancement using a neural network-based combined symbol
US-2018033449-A1 · Feb 1, 2018 · US
US11735199B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11735199-B2 |
| Application number | US-201816648217-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2018 |
| Priority date | Sep 18, 2017 |
| Publication date | Aug 22, 2023 |
| Grant date | Aug 22, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Method for modifying a style of an audio object, and corresponding electronic device, computer readable program products and computer readable storage medium The disclosure relates to a method for processing an input audio signal. According to an embodiment, the method includes obtaining a base audio signal being a copy of the input audio signal and generating an output audio signal from the base signal, the output audio signal having style features obtained by modifying the base signal so that a distance between base style features representative of a style of the base signal and a reference style feature decreases. The disclosure also relates to corresponding electronic device, computer readable program product and computer readable storage medium.
Opening claim text (preview).
The invention claimed is: 1. An electronic device comprising at least one memory and one or several processors configured for: obtaining at least one base audio signal; and generating at least one output audio signal from said at least one base audio signal by iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases, wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal. 2. The electronic device according to claim 1 , wherein said at least one base audio signal comprises a speech content. 3. The electronic device according to claim 1 , wherein said reference style is a style of at least one reference audio signal. 4. The electronic device according to claim 3 wherein said at least one reference audio signal comprises a speech content. 5. The electronic device according to claim 3 , wherein said at least one reference audio signal comprises an audio content other than a speech content. 6. The electronic device according to claim 3 , wherein at least one of said at least one reference style feature and said at least one base style feature is obtained by processing at least one of said at least one reference audio signal and said at least one base audio signal in at least one neural network. 7. The electronic device according to claim 3 , wherein obtaining said at least one reference style feature comprises at least one of: subband filtering of said at least one reference audio signal; obtaining an envelope of said at least one filtered reference audio signal; and modulating said obtained envelope. 8. The electronic device according to claim 1 , wherein obtaining said at least one base style feature comprises at least one of: subband filtering of said at least one base audio signal; obtaining an envelope of said at least one filtered base audio signal; and modulating said obtained envelope. 9. A method comprising: obtaining at least one base audio signal; and generating at least one output audio signal from said at least one base audio signal by iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases, wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal. 10. The method according to claim 9 , wherein said reference style is a style of at least one reference audio signal. 11. The method according to claim 10 , wherein said at least one reference audio signal comprises a speech content. 12. The method according to claim 10 , wherein said at least one reference audio signal comprises an audio content other than a speech content. 13. The method according to claim 10 , wherein at least one of said at least one reference style feature and said at least one base style feature is obtained by processing at least one of said at least one reference audio signal and said at least one base audio signal in at least one neural network. 14. The method according to claim 10 , wherein obtaining said at least one reference style feature comprises at least one of: subband filtering of said at least one reference audio signal; obtaining an envelope of said at least one filtered reference audio signal; and modulating said obtained envelope. 15. The method according to claim 9 , wherein obtaining said at least one base style feature comprises at least one of: subband filtering of said at least one base audio signal; obtaining an envelope of said at least one filtered base audio signal; and modulating said obtained envelope. 16. A non-transitory computer readable storage medium, comprising program code instructions executable by a processor, for: obtaining at least one base audio signal; and generating at least one output audio signal from said at least one base audio signal by iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases, wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal.
Changing voice quality, e.g. pitch or formants · CPC title
Adapting to target pitch · CPC title
using neural networks · CPC title
specially adapted for particular use · CPC title
Voice conversion or morphing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.