Speech Model-Based Neural Network-Assisted Signal Enhancement

US2018366138A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018366138-A1
Application numberUS-201715625966-A
CountryUS
Kind codeA1
Filing dateJun 16, 2017
Priority dateJun 16, 2017
Publication dateDec 20, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into estimated clean speech spectral magnitudes. In one embodiment, a magnitude modifier modifies an original frequency spectrum of the noisy speech signal using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum, and a synthesis block converts the enhanced frequency spectrum into time domain, as an output speech sequence. Other embodiments are also described.

First claim

Opening claim text (preview).

1 . A digital speech signal enhancer, comprising: a neural network processor to process a plurality of noisy speech coding parameters that have been derived from an input speech sequence b a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of estimated clean speech coding parameters; a vocoder model generator and spectral magnitude generator, configured to process the estimated clean speech coding parameters into estimated clean speech spectral magnitudes; a magnitude modifier configured to modify an original frequency spectrum of the input speech sequence using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum; and a synthesis block configured to convert the enhanced frequency spectrum into time domain, as an output speech sequence. 2 . The speech signal enhancer of claim 1 wherein the noisy speech coding parameters are linear predictive parameters, or non-linear mappings of such linear predictive parameters to a Line Spectral Pairs domain or to a Log Area Ratios domain. 3 . The speech signal enhancer of claim 2 wherein the noisy speech coding parameters are only formant type or short-term parameters, not long-term parameters. 4 . The speech signal enhancer of claim 1 wherein the vocoder model generator is to process the estimated clean speech coding parameters into formant information, pitch information, or both, and the spectral magnitude generator is configured to process the formant information, pitch information, or both into the estimated clean speech spectral magnitudes. 5 . The speech signal enhancer of claim 4 wherein the vocoder model generator is to process the estimated clean speech coding parameters into formant and pitch information which defines a formant filter for short-term prediction and a pitch filter for long-term prediction, and wherein the spectral magnitude generator computes a spectral envelope of a frequency response of a cascade of the formant and pitch filters. 6 . The speech signal enhancer of claim 4 wherein the spectral magnitude generator is configured to evaluate the original frequency spectrum of the input speech sequence, when processing the formant information, pitch information, or both, into the estimated clean speech spectral magnitudes, to produce refined clean speech spectral magnitudes. 7 . The speech signal enhancer of claim 6 wherein the spectral magnitude generator evaluates the original frequency spectrum of the input speech sequence by comparing a spectral valley in the original frequency spectrum to a spectral valley in the estimated clean speech spectral magnitudes, and scales the spectral valley in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 8 . The speech signal enhancer of claim 6 wherein the spectral magnitude generator evaluates the original frequency spectrum of the input speech sequence by comparing a spectral peak in the original frequency spectrum to a spectral peak in the estimated clean speech spectral magnitudes, and scales the spectral peak in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 9 . A digital speech signal enhancement process comprising: processing using a neural network a plurality of noisy speech coding parameters that have been derived from an input speech sequence by a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of estimated clean speech coding parameters; processing the estimated clean speech coding parameters into estimated clean speech spectral magnitudes; modifying an original frequency spectrum of the input speech sequence using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum; and converting the enhanced frequency spectrum into time domain, as an output speech sequence. 10 . The process of claim 9 wherein the noisy speech coding parameters are formant type or short-term linear predictive parameters, not long-term linear predictive parameters. 11 . The process of claim 9 wherein processing the estimated clean speech coding parameters into estimate clean speech spectral magnitudes comprises: generating formant information, pitch information, or both; and processing the formant information, pitch information, or both into the estimated clean speech spectral magnitudes. 12 . The process of claim 9 further comprising evaluating the original frequency spectrum of the input speech sequence, when processing the formant information, pitch information, or both, into the estimated clean speech spectral magnitudes, to produce refined clean speech spectral magnitudes. 13 . The process of claim 12 wherein evaluating the original frequency spectrum of the input speech sequence comprises: comparing a spectral valley in the original frequency spectrum to a spectral valley in the estimated clean speech spectral magnitudes; and scaling the spectral valley in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 14 . The process of claim 12 wherein evaluating the original frequency spectrum of the input speech sequence comprises: comparing a spectral peak in the original frequency spectrum to a spectral peak in the estimated clean speech spectral magnitudes; and scaling the spectral peak in the estimated clean speech spectral magnitudes in accordance with the comparison, when producing the refined clean speech spectral magnitudes. 15 . A digital speech signal enhancer comprising: an artificial neural network to process a plurality of noisy, speech coding parameters that have been derived from an input speech sequence by a vocoder implementing a linear predictive model using a time-varying model for formant information, or formant and pitch information, to produce a plurality of clean, speech coding parameters; a speech coding model generator to process the clean, speech coding parameters into formant information and pitch information; a pitch filter configured in accordance with the pitch information; and a formant filter configured in accordance with the formant information, wherein the pitch and formant filters are coupled in cascade to filter an input excitation signal and produce a synthesized speech signal. 16 . The speech signal enhancer of claim 15 further comprising: a linear predictive model generator to derive the formant information, or the formant and pitch information, and perceptual weighting information, from the input speech sequence; and a linear prediction analysis filter to filter the input speech sequence to produce a perceptually weighted excitation signal that is filtered by the pitch and formant filters, wherein the linear prediction analysis filter is configured in accordance with the formant information or the formant and pitch information, and in accordance with the perceptual weighting information. 17 . The speech signal enhancer of claim 16 further comprising an excitation modifier that is configured to transform the perceptually weighted excitation signal from time domain to frequency domain, modify spectral magnitudes of the perceptually weighted excitation signal in the spectral domain, in accordance with the information from the linear predictive model generat

Assignees

Inventors

Classifications

  • the extracted parameters being formant information · CPC title

  • Processing in the frequency domain · CPC title

  • G10L25/30Primary

    using neural networks · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Noise filtering · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018366138A1 cover?
Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into est…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).