What technology area does this patent fall under?

Primary CPC classification G10L25/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Voice processing method, apparatus, and device and storage medium

US2022215848A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022215848-A1
Application number	US-202217703713-A
Country	US
Kind code	A1
Filing date	Mar 24, 2022
Priority date	May 15, 2020
Publication date	Jul 7, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.

First claim

Opening claim text (preview).

What is claimed is: 1 . A voice processing method, comprising: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set. 2 . The method according to claim 1 , wherein determining the frequency-domain characteristic of the historical voice frame comprises: performing time-frequency transform on the historical voice frame to obtain a frequency-domain coefficient corresponding to the historical voice frame; and using the frequency-domain coefficient or an amplitude spectrum extracted from the frequency-domain coefficient as the frequency-domain characteristic of the historical voice frame. 3 . The method according to claim 2 , wherein performing the time-frequency transform comprises: performing short-term Fourier transform (STFT) on the historical voice frame, to obtain a plurality of sets of STFT coefficients corresponding to the historical voice frame; and using the frequency-domain coefficient or an amplitude spectrum extracted from the frequency-domain coefficient as the frequency-domain characteristic of the historical voice frame comprises: performing any one of: using the plurality of sets of STFT coefficients as the frequency-domain characteristic of the historical voice frame; and forming an amplitude coefficient sequence according to amplitude spectra corresponding to at least some of the STFT coefficients in each set of STFT coefficients, and using the amplitude coefficient sequence as the frequency-domain characteristic of the historical voice frame. 4 . The method according to claim 1 , wherein the network model includes a first NN and a plurality of second NNs; and invoking the network model comprises: invoking the first NN to predict the frequency-domain characteristic of the historical voice frame, to obtain a virtual frequency-domain characteristic of the target voice frame; invoking the second NNs to predict the virtual frequency-domain characteristic of the target voice frame, to obtain parameters corresponding to the second NNs; and establishing the parameter set of the target voice frame according to the parameters respectively corresponding to the plurality of second NNs. 5 . The method according to claim 4 , wherein the network model includes a third NN; and establishing the parameter set of the target voice frame according to the parameters respectively corresponding to the plurality of second NNs comprises: acquiring an energy parameter of the historical voice frame; invoking the third NN to predict the energy parameter of the historical voice frame, to obtain an energy parameter of the target voice frame; and establishing the parameter set of the target voice frame according to the parameters respectively corresponding to the plurality of second NNs and the energy parameter of the target voice frame, the target voice frame including m subframes, the energy parameter of the target voice frame including a gain value of each of the subframes of the target voice frame, and m being a positive integer. 6 . The method according to claim 1 , wherein reconstructing the target voice frame comprises: establishing a reconstruction filter according to the parameter set; acquiring an excitation signal of the historical voice frame; determining an excitation signal of the target voice frame according to the excitation signal of the historical voice frame; and filtering the excitation signal of the target voice frame according to the reconstruction filter, to obtain a reconstructed target voice frame. 7 . The method according to claim 6 , wherein the target voice frame is an n th voice frame in a voice signal transmitted by a voice over Internet protocol (VoIP) system, the historical voice frame includes an (n−t) th voice frame to an (n−1) th voice frame in the voice signal transmitted by the VoIP system, n and t being both positive integers, and the excitation signal of the historical voice frame includes an excitation signal of the (n−1) th voice frame; and determining the excitation signal of the target voice frame comprises determining the excitation signal of the (n−1) th voice frame as the excitation signal of the target voice frame. 8 . The method according to claim 6 , wherein the target voice frame is an n th voice frame in a voice signal transmitted by a VoIP system, the historical voice frame includes an (n−t) th voice frame to an (n−1) th voice frame in the voice signal transmitted by the VoIP system, n and t being both positive integers, and the excitation signal of the historical voice frame includes an excitation signal of each voice frame in the (n−t) th voice frame to the (n−1) th voice frame; and determining the excitation signal of the target voice frame comprises: averaging the excitation signals of the voice frames in the (n−t) th voice frame to the (n−1) th voice frame to obtain the excitation signal of the target voice frame; or performing weighted summation on the excitation signals of the voice frames in the (n−t) th voice frame to the (n−1) th voice frame to obtain the excitation signal of the target voice frame. 9 . The method according to claim 6 , wherein in response to determining that the target voice frame is an unvoiced frame, the parameter set includes a short-term correlation parameter of the target voice frame, and the reconstruction filter includes a linear predictive coding (LPC) filter; the target voice frame including k daughter frames, the short-term correlation parameter of the target voice frame including a line spectral frequency (LSF) of a k th daughter frame of the target voice frame and an interpolation factor of the target voice frame, and k being an integer greater than 1. 10 . The method according to claim 9 , wherein filtering the excitation signal of the target voice frame comprises: performing interpolation according to the LSF of the k th daughter frame and the interpolation factor of the target voice frame, to obtain an LSF of a daughter frame different from the k th daughter frame; determining an LPC coefficient of any one daughter frame according to an LSF of the any one daughter frame; performing LPC filtering according to the excitation signal of the target voice frame and the LPC coefficient of the any one daughter frame, to obtain any one reconstructed daughter frame; and synthesizing the k reconstructed daughter frames to obtain the reconstructed target voice frame. 11 . The method according to claim 10 , wherein the parameter set includes energy parameters respectively corresponding to the k daughter frames of the target voice frame; and the method further comprises: performing signal amplification on the any one reconstructed daughter frame according to the energy parameter of the any one daughter frame. 12 . The method according to claim 6 , wherein in response to determining that the target voice frame is a voiced frame, the parameter set includes a short-term correlation parameter of the target voice frame and a long-term correlation parameter of the target voice frame, and the reconstruction filter includes a long-term predictive (LTP) filter and an LPC

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L19/08
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters · CPC title
G10L19/005
Correction of errors induced by the transmission channel, if related to the coding algorithm · CPC title
G10L19/07
Line spectrum pair [LSP] vocoders · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title
G10L25/30Primary
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 72001058

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022215848A1 cover?: A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of …
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor

Speech Model-Based Neural Network-Assisted Signal Enhancement

System and method of jitter buffer management

Encoder, decoder and method for encoding and decoding audio content using parameters for enhancing a concealment

Frequently asked questions