Spectrogram to waveform synthesis using convolutional networks
US-11462209-B2 · Oct 4, 2022 · US
US12586600B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586600-B2 |
| Application number | US-202318163848-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 2, 2023 |
| Priority date | Feb 21, 2022 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving a current spectrogram frame and reconstructing a phase of the current spectrogram frame by, for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame and estimating the phase of the current spectrogram frame based on a magnitude of the current spectrogram frame and the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame. The method also includes synthesizing, for the current spectrogram frame, a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising: receiving a current spectrogram frame; reconstructing a phase of the current spectrogram frame by: for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame; and estimating the phase of the current spectrogram frame by performing one or more iterations within a sliding window that contains the current spectrogram frame, wherein performing each iteration of the one or more iterations within the sliding window comprises: estimating an uncommitted phase of the current spectrogram frame based on a sequence of N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame; and updating a complex-valued spectrogram representation within the sliding window by combining the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame, the estimated uncommitted phase, and a magnitude of the current spectrogram frame; and for the current spectrogram frame, synthesizing a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame. 2 . The method of claim 1 , wherein: the current spectrogram frame comprises a log-magnitude spectrogram frame output from a speech conversion model; and prior to reconstructing the phase of the current spectrogram frame, the phase of the current spectrogram frame is initialized with a value equal to zero. 3 . The method of claim 1 , wherein the M number of committed spectrogram frames preceding the current spectrogram frame is equal to one. 4 . The method of claim 1 , wherein the M number of committed spectrogram frames preceding the current spectrogram frame is at least two. 5 . The method of claim 1 , wherein estimating the uncommitted phase of the current spectrogram frame based on the N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame comprises: for each corresponding uncommitted spectrogram frame in the sequence of N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame, obtaining a value of an uncommitted phase of the corresponding uncommitted spectrogram frame; and estimating the uncommitted phase of the current spectrogram frame is based on the value of the uncommitted phase of each corresponding uncommitted spectrogram frame in the sequence of N number of committed spectrogram frames within the sliding window that are subsequent to the current spectrogram frame. 6 . The method of claim 1 , wherein the N number of uncommitted spectrogram frames and the M number of committed spectrogram frames are equal. 7 . The method of claim 1 , wherein the N number of uncommitted spectrogram frames and the M number of committed spectrogram frames are different. 8 . The method of claim 1 , wherein the N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame is equal to one. 9 . The method of claim 1 , wherein the N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame is at least two. 10 . The method of claim 1 , wherein the current spectrogram frame is in a Short-time Fourier transform (STFT) domain when reconstructing the phase of the current spectrogram frame. 11 . The method of claim 10 , wherein synthesizing the new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame comprises running a streaming inverse STFT on an output frame corresponding to the current spectrogram frame, the output frame extracted using the estimated phase of the current spectrogram frame. 12 . The method of claim 1 , wherein the operations further comprise, after reconstructing the phase of the current spectrogram frame, designating the current spectrogram frame as a committed frame and storing the estimated phase of the current spectrogram frame as a committed phase. 13 . The method of claim 1 , wherein the data processing hardware resides on a user computing device or a server. 14 . A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a current spectrogram frame; reconstructing a phase of the current spectrogram frame by: for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame; and estimating the phase of the current spectrogram frame by performing one or more iterations within a sliding window that contains the current spectrogram frame, wherein performing each iteration of the one or more iterations within the sliding window comprises: estimating an uncommitted phase of the current spectrogram frame based on a sequence of N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame; and updating a complex-valued spectrogram representation within the sliding window by combining the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame, the estimated uncommitted phase, and a magnitude of the current spectrogram frame; and for the current spectrogram frame, synthesizing a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame. 15 . The system of claim 14 , wherein: the current spectrogram frame comprises a log-magnitude spectrogram frame output from a speech conversion model; and prior to reconstructing the phase of the current spectrogram frame, the phase of the current spectrogram frame is initialized with a value equal to zero. 16 . The system of claim 14 , wherein the M number of committed spectrogram frames preceding the current spectrogram frame is equal to one. 17 . The system of claim 14 , wherein the M number of committed spectrogram frames preceding the current spectrogram frame is at least two. 18 . The system of claim 14 , wherein estimating the uncommitted phase of the current spectrogram frame based on the N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame comprises: for each corresponding uncommitted spectrogram frame in the sequence of N number of uncommitted spectrogram frames within the sliding window that are subsequent to the current spectrogram frame, obtaining a value of an uncommitted phase of the corresponding uncommitted spectrogram frame; and estimating the uncommitted phase of the current spectrogram frame is further-based on the value of the uncommitted phase of each corresponding uncommitted spectrogram frame in the sequence of N number of committed spectrogram frames within the sliding window that are subsequent to
for improving intelligibility · CPC title
Changing voice quality, e.g. pitch or formants · CPC title
Processing in the frequency domain · CPC title
Details of the transformation process · CPC title
Transforming into visible information · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.