What technology area does this patent fall under?

Primary CPC classification G10L25/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training method of speech signal processing model with shared layer, electronic device and storage medium

US11158304B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11158304-B2
Application number	US-201916655548-A
Country	US
Kind code	B2
Filing date	Oct 17, 2019
Priority date	Nov 24, 2017
Publication date	Oct 26, 2021
Grant date	Oct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech signal processing model training method, applied to an electronic device, comprising: acquiring a sample speech and determining task input features each for one speech signal processing task among one or more speech signal processing tasks for the sample speech; establishing a starting multi-task neural network comprising one or more task layers corresponding to the one or more speech signal processing tasks and a shared layer common to the one or more speech signal processing tasks; determining a target training loss function based on separate training loss functions each for one speech signal processing task of the one or more speech signal processing tasks; and using the task input features of the one or more speech signal processing tasks as a training input of the starting multi-task neural network, and updating model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model, wherein the starting multi-task neural network comprises a first multi-task neural network, the method further comprising: determining, from the one or more speech signal processing tasks, one or more speech signal processing tasks having a training complexity higher than a preset complexity threshold as one or more first-class speech signal processing tasks, the one or more first-class speech signal processing tasks having corresponding first-class task input features and a first-class target training loss function; and using the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech as a training input of an initial untrained multi-task neural network, and updating parameters of a first shared layer and first task layers corresponding to the one or more first-class speech signal processing tasks by minimizing the first-class target training loss function as a training objective, until the initial untrained multi-task neural network converges, to obtain the first multi-task neural network. 2. The speech signal processing model training method according to claim 1 , wherein updating the model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective comprises: for the shared layer, updating the model parameters of the shared layer based on the target training loss function by minimizing the target training loss function as the training objective; and for each of the one or more task layers corresponding to the one or more speech signal processing tasks, updating the model parameters of the one or more task layers based on the separate training loss function corresponding to the one or more speech signal processing tasks by minimizing the separate target training loss function as the training objective. 3. The speech signal processing model training method according to claim 1 , further comprising: determining the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech; and determining the first-class target training loss function based on training loss functions corresponding to the one or more first-class speech signal processing tasks. 4. The speech signal processing model training method according to claim 1 , wherein determining the target training loss function based on the separate training loss functions each for one speech signal processing task among the one or more speech signal processing tasks comprises: for each of the one or more speech signal processing tasks, multiplying the corresponding separate training loss function by a corresponding weight to obtain a corresponding multiplication result for each speech signal processing task; and determining the target training loss function by adding each corresponding multiplication result of the one or more speech signal processing tasks. 5. The speech signal processing model training method according to claim 1 , wherein: the shared layer comprises a long short term memory (LSTM) network, and the one or more task layers corresponding to the one or more speech signal processing tasks each comprises a fully connected multi-layer perceptron (MLP) network; and updating the model parameters of the shared layer and the one or more task layers of the starting multi-task neural network comprises: updating, in the LSTM network, connection parameters from an input layer to a hidden layer, connection parameters from the hidden layer to an output layer, or connection parameters between the hidden layers of the LSTM network; and updating, in the each fully connected MLP network, connection parameters from an input layer to a hidden layer or connection parameters from the hidden layer to an output layer of the each fully connected MLP network. 6. An electronic device, comprising: at least one memory and at least one processor; the memory storing a program, the processor invoking the program stored by the memory, and the program being configured for: acquiring a sample speech and determining task input features each for one speech signal processing task among one or more speech signal processing tasks for the sample speech; establishing a starting multi-task neural network model comprising one or more task layers corresponding to the one or more speech signal processing tasks and a shared layer common to the one or more speech signal processing tasks; determining a target training loss function based on separate training loss functions each for one speech signal processing task among the one or more speech signal processing tasks; using the task input features of the one or more speech signal processing tasks as a training input of the starting multi-task neural network, and updating model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model; determining, from the one or more speech signal processing tasks, one or more speech signal processing tasks having a training complexity higher than a preset complexity threshold as one or more first-class speech signal processing tasks, the one or more first-class speech signal processing tasks having corresponding first-class task input features and a first-class target training loss function; and using the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech as a training input of an initial untrained multi-task neural network, and updating parameters of a first shared layer and first task layers corresponding to the one or more first-class speech signal processing tasks by minimizing the first-class target training loss function as a training objective, until the initial untrained multi-task neural network converges, to obtain the starting multi-task neural network. 7. The electronic device according to claim 6 , wherein the program is further configured for: for the shared layer, updating the model parameters of the shared layer based on the target training loss function by minimizing the target training loss function as the training objective; and for each of the one or more task layers corresponding to the one or more speech signal processing tasks, updating the model parameters of the one or more task layers based on the separate training loss function corresponding to the one or more speech signal processing tasks by minimiz

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G10L2021/02082
the noise being echo, reverberation of the speech · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/0635
updating or merging of old and new templates; Mean values; Weighting · CPC title

Patent family

Related publications grouped by family.

View patent family 66630868

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11158304B2 cover?: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting mult…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-task neural networks with task-specific paths

Deep multi-task representation learning

Joint Many-Task Neural Network Model for Multiple Natural Language Processing (NLP) Tasks

Training of front-end and back-end neural networks

Multilingual, acoustic deep neural networks

Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Frequently asked questions