Facial animation using emotions for conversational ai systems and applications
US-2024412440-A1 · Dec 12, 2024 · US
US2018366138A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018366138-A1 |
| Application number | US-201715625966-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 16, 2017 |
| Priority date | Jun 16, 2017 |
| Publication date | Dec 20, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into estimated clean speech spectral magnitudes. In one embodiment, a magnitude modifier modifies an original frequency spectrum of the noisy speech signal using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum, and a synthesis block converts the enhanced frequency spectrum into time domain, as an output speech sequence. Other embodiments are also described.
Opening claim text (preview).
1 . A digital speech signal enhancer, comprising: a neural network processor to process a plurality of noisy speech coding parameters that have been derived from an input speech sequence b a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of estimated clean speech coding parameters; a vocoder model generator and spectral magnitude generator, configured to process the estimated clean speech coding parameters into estimated clean speech spectral magnitudes; a magnitude modifier configured to modify an original frequency spectrum of the input speech sequence using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum; and a synthesis block configured to convert the enhanced frequency spectrum into time domain, as an output speech sequence. 2 . The speech signal enhancer of claim 1 wherein the noisy speech coding parameters are linear predictive parameters, or non-linear mappings of such linear predictive parameters to a Line Spectral Pairs domain or to a Log Area Ratios domain. 3 . The speech signal enhancer of claim 2 wherein the noisy speech coding parameters are only formant type or short-term parameters, not long-term parameters. 4 . The speech signal enhancer of claim 1 wherein the vocoder model generator is to process the estimated clean speech coding parameters into formant information, pitch information, or both, and the spectral magnitude generator is configured to process the formant information, pitch information, or both into the estimated clean speech spectral magnitudes. 5 . The speech signal enhancer of claim 4 wherein the vocoder model generator is to process the estimated clean speech coding parameters into formant and pitch information which defines a formant filter for short-term prediction and a pitch filter for long-term prediction, and wherein the spectral magnitude generator computes a spectral envelope of a frequency response of a cascade of the formant and pitch filters. 6 . The speech signal enhancer of claim 4 wherein the spectral magnitude generator is configured to evaluate the original frequency spectrum of the input speech sequence, when processing the formant information, pitch information, or both, into the estimated clean speech spectral magnitudes, to produce refined clean speech spectral magnitudes. 7 . The speech signal enhancer of claim 6 wherein the spectral magnitude generator evaluates the original frequency spectrum of the input speech sequence by comparing a spectral valley in the original frequency spectrum to a spectral valley in the estimated clean speech spectral magnitudes, and scales the spectral valley in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 8 . The speech signal enhancer of claim 6 wherein the spectral magnitude generator evaluates the original frequency spectrum of the input speech sequence by comparing a spectral peak in the original frequency spectrum to a spectral peak in the estimated clean speech spectral magnitudes, and scales the spectral peak in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 9 . A digital speech signal enhancement process comprising: processing using a neural network a plurality of noisy speech coding parameters that have been derived from an input speech sequence by a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of estimated clean speech coding parameters; processing the estimated clean speech coding parameters into estimated clean speech spectral magnitudes; modifying an original frequency spectrum of the input speech sequence using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum; and converting the enhanced frequency spectrum into time domain, as an output speech sequence. 10 . The process of claim 9 wherein the noisy speech coding parameters are formant type or short-term linear predictive parameters, not long-term linear predictive parameters. 11 . The process of claim 9 wherein processing the estimated clean speech coding parameters into estimate clean speech spectral magnitudes comprises: generating formant information, pitch information, or both; and processing the formant information, pitch information, or both into the estimated clean speech spectral magnitudes. 12 . The process of claim 9 further comprising evaluating the original frequency spectrum of the input speech sequence, when processing the formant information, pitch information, or both, into the estimated clean speech spectral magnitudes, to produce refined clean speech spectral magnitudes. 13 . The process of claim 12 wherein evaluating the original frequency spectrum of the input speech sequence comprises: comparing a spectral valley in the original frequency spectrum to a spectral valley in the estimated clean speech spectral magnitudes; and scaling the spectral valley in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 14 . The process of claim 12 wherein evaluating the original frequency spectrum of the input speech sequence comprises: comparing a spectral peak in the original frequency spectrum to a spectral peak in the estimated clean speech spectral magnitudes; and scaling the spectral peak in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 15 . A digital speech signal enhancer comprising: an artificial neural network to process a plurality of noisy, speech coding parameters that have been derived from an input speech sequence by a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of clean, speech coding parameters; a speech coding model generator to process the clean, speech coding parameters into formant information and pitch information; a pitch filter configured in accordance with the pitch information; and a formant filter configured in accordance with the formant information, wherein the pitch and formant filters are coupled in cascade to filter an input excitation signal and produce a synthesized speech signal. 16 . The speech signal enhancer of claim 15 further comprising: a linear predictive model generator to derive the formant information, or the formant and pitch information, and perceptual weighting information, from the input speech sequence; and a linear prediction analysis filter to filter the input speech sequence to produce a perceptually weighted excitation signal that is filtered by the pitch and formant filters, wherein the linear prediction analysis filter is configured in accordance with the formant information or the formant and pitch information, and in accordance with the perceptual weighting information. 17 . The speech signal enhancer of claim 16 further comprising an excitation modifier that is configured to transform the perceptually weighted excitation signal from time domain to frequency domain, modify spectral magnitudes of the perceptually weighted excitation signal in the spectral domain, in accordance with the information from the linear predictive model generat
the extracted parameters being formant information · CPC title
Processing in the frequency domain · CPC title
using neural networks · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Noise filtering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.