Estimation system of spectral envelopes and group delays for sound analysis and synthesis, and audio signal synthesis system

US9368103B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9368103-B2
Application numberUS-201314418680-A
CountryUS
Kind codeB2
Filing dateJul 30, 2013
Priority dateAug 1, 2012
Publication dateJun 14, 2016
Grant dateJun 14, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

For high-accuracy analysis and high-quality synthesis of voice sound (singing and speech), provided herein are a system and a method for estimating from an audio signal spectral envelopes and group delays for sound analysis and synthesis with high accuracy and high temporal resolution. An estimation system of spectral envelopes and group delays includes a fundamental frequency estimation section, an amplitude spectrum acquisition section, a group delay extraction section, a spectral envelope integration section, and a group delay integration section. The spectral envelope integration section sequentially obtains a spectral envelope for sound synthesis by averaging overlapped spectra. The group delay integration section selects from a plurality of group delays a group delay corresponding to the maximum envelope of each frequency component of the spectral envelope and integrates groups delays thus selected to sequentially obtain a group delay for sound synthesis.

First claim

Opening claim text (preview).

The invention claimed is: 1. An estimation system of spectral envelopes and group delays for sound analysis and synthesis comprising at least one processor operable to function as: a fundamental frequency estimation section configured to estimate F 0 s from an audio signal at all points of time or at all points of sampling; an amplitude spectrum acquisition section configured to divide the audio signal into a plurality of frames, centering on each point of time or each point of sampling, by using a window having a window length changing with F 0 at each point of time or each point of sampling, to perform Discrete Fourier Transform (DFT) analysis on the plurality of frames of the audio signal, and thus to acquire amplitude spectra at the respective frames; a group delay extraction section configured to extract group delays as phase frequency differentials at the respective frames by performing a group delay extraction algorithm accompanied by DFT analysis on the plurality of frames of the audio signal; a spectral envelope integration section configured to obtain overlapped spectra at a predetermined time interval by overlapping the amplitude spectra corresponding to the frames included in a certain period determined based on a fundamental period of F 0 , and to average the overlapped spectra to sequentially obtain a spectral envelope for sound synthesis; and a group delay integration section configured to select a group delay corresponding to a maximum envelope for each frequency component of the spectral envelope from the group delays at a predetermined time interval, and to integrate the thus selected group delays to sequentially obtain a group delay for sound synthesis. 2. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 1 , wherein: the fundamental frequency estimation section is configured to identify voiced segments and unvoiced segments in addition to the estimation of F 0 s and to interpolate the unvoiced segments with F 0 values of the voiced segments or allocate predetermined values to the unvoiced segments as F 0 . 3. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 1 , wherein: the spectral envelope integration section is configured to obtain the spectral envelope for sound synthesis by calculating a mean value of the maximum envelope and a minimum envelope of the overlapped spectra. 4. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 3 , wherein: the spectral envelope integration section is configured to obtain the spectral envelope for sound synthesis by using, as the mean value, a median value of the maximum envelope and the minimum envelope of the overlapped spectra. 5. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 4 , wherein: the maximum envelope is transformed to fill in valleys of the minimum envelope and a transformed minimum envelope thus obtained is used as the minimum envelope in calculating the mean value. 6. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 3 , wherein: the maximum envelope is transformed to fill in valleys of the minimum envelope and a transformed minimum envelope thus obtained is used as the minimum envelope in calculating the mean value. 7. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 3 , wherein: the spectral envelope integration section is configured to obtain the spectral envelope for sound synthesis by replacing amplitude values of the spectral envelope of frequency bins under F 0 with an amplitude value of the spectral envelope at F 0 . 8. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 7 , further comprising: a two-dimensional low-pass filter operable to filter the replaced spectral envelope. 9. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 1 , wherein: the group delay integration section is configured to store, by frequency, the group delays in the frames corresponding to the maximum envelopes for respective frequency components of the overlapped spectra, to compensate a time-shift of analysis of the stored group delays, and to normalize the stored group delays for use in sound synthesis. 10. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 9 , wherein: the group delay integration section is configured to obtain the group delay for sound synthesis by replacing values of group delay of frequency bins under F 0 with a value of the group delay at F 0 . 11. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 10 , wherein: the group delay integration section is configured to smooth the replaced group delays for use in sound synthesis. 12. The estimation system of spectral envelopes and group delays for sound analysis and synthesis according to claim 11 , wherein: in smoothing the replaced group delays for use in sound synthesis, the replaced group delays are converted with sin function and cos function to remove discontinuity due to the fundamental period, the converted group delays are subsequently filtered with a two-dimensional low-pass filter, and then the filtered group delays are converted to an original state with tan −1 function for use in sound synthesis. 13. An audio signal synthesis system using the spectral envelopes and group delays for sound analysis and synthesis estimated by the estimation system according to claim 1 , the audio signal synthesis system comprising at least one processor operable to function as: a reading section configured to read out, in a fundamental period for sound synthesis, the spectral envelopes and group delays for sound synthesis from a data file of the spectral envelopes and group delays for sound synthesis estimated by the estimation system, wherein the fundamental period for sound synthesis is a reciprocal of the fundamental frequency for sound synthesis; a conversion section configured to convert the read-out group delays into phase spectra; a unit waveform generation section configured to generate unit waveforms based on the read-out spectral envelopes and the phase spectra; and a synthesis section configured to output a synthesized audio signal obtained by performing overlap-add calculation on the generated unit waveforms in the fundamental period for sound synthesis. 14. The audio signal synthesis system according to claim 13 , further comprising: a discontinuity suppression section configured to suppress an occurrence of discontinuity of the read-out group delays along a time axis in a low frequency range before the conversion section converts the read-out group delays. 15. The audio signal synthesis system according to claim 14 , wherein: the discontinuity suppression section is configured to smooth group delays in the low frequency range after adding an optimal offset to the group delay for each voiced segment. 16. The audio signal synthesis system according to claim 15 , further comprising: a compensation section configured to multiply the respective group delays by the fundamental period for sound synthesis as a multiplier coefficient after the conversion section converts the group delays or before the discontinuity suppression s

Assignees

Inventors

Classifications

  • the extracted parameters being formant information · CPC title

  • G10L13/02Primary

    Methods for producing synthetic speech; Speech synthesisers · CPC title

  • Pitch tracking · CPC title

  • Adapting to target pitch · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9368103B2 cover?
For high-accuracy analysis and high-quality synthesis of voice sound (singing and speech), provided herein are a system and a method for estimating from an audio signal spectral envelopes and group delays for sound analysis and synthesis with high accuracy and high temporal resolution. An estimation system of spectral envelopes and group delays includes a fundamental frequency estimation sectio…
Who is the assignee on this patent?
Nat Inst Of Advanced Ind Scien
What technology area does this patent fall under?
Primary CPC classification G10L13/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).