Multi-task neural networks with task-specific paths
US-2020380372-A1 · Dec 3, 2020 · US
US11158304B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11158304-B2 |
| Application number | US-201916655548-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 17, 2019 |
| Priority date | Nov 24, 2017 |
| Publication date | Oct 26, 2021 |
| Grant date | Oct 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.
Opening claim text (preview).
What is claimed is: 1. A speech signal processing model training method, applied to an electronic device, comprising: acquiring a sample speech and determining task input features each for one speech signal processing task among one or more speech signal processing tasks for the sample speech; establishing a starting multi-task neural network comprising one or more task layers corresponding to the one or more speech signal processing tasks and a shared layer common to the one or more speech signal processing tasks; determining a target training loss function based on separate training loss functions each for one speech signal processing task of the one or more speech signal processing tasks; and using the task input features of the one or more speech signal processing tasks as a training input of the starting multi-task neural network, and updating model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model, wherein the starting multi-task neural network comprises a first multi-task neural network, the method further comprising: determining, from the one or more speech signal processing tasks, one or more speech signal processing tasks having a training complexity higher than a preset complexity threshold as one or more first-class speech signal processing tasks, the one or more first-class speech signal processing tasks having corresponding first-class task input features and a first-class target training loss function; and using the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech as a training input of an initial untrained multi-task neural network, and updating parameters of a first shared layer and first task layers corresponding to the one or more first-class speech signal processing tasks by minimizing the first-class target training loss function as a training objective, until the initial untrained multi-task neural network converges, to obtain the first multi-task neural network. 2. The speech signal processing model training method according to claim 1 , wherein updating the model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective comprises: for the shared layer, updating the model parameters of the shared layer based on the target training loss function by minimizing the target training loss function as the training objective; and for each of the one or more task layers corresponding to the one or more speech signal processing tasks, updating the model parameters of the one or more task layers based on the separate training loss function corresponding to the one or more speech signal processing tasks by minimizing the separate target training loss function as the training objective. 3. The speech signal processing model training method according to claim 1 , further comprising: determining the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech; and determining the first-class target training loss function based on training loss functions corresponding to the one or more first-class speech signal processing tasks. 4. The speech signal processing model training method according to claim 1 , wherein determining the target training loss function based on the separate training loss functions each for one speech signal processing task among the one or more speech signal processing tasks comprises: for each of the one or more speech signal processing tasks, multiplying the corresponding separate training loss function by a corresponding weight to obtain a corresponding multiplication result for each speech signal processing task; and determining the target training loss function by adding each corresponding multiplication result of the one or more speech signal processing tasks. 5. The speech signal processing model training method according to claim 1 , wherein: the shared layer comprises a long short term memory (LSTM) network, and the one or more task layers corresponding to the one or more speech signal processing tasks each comprises a fully connected multi-layer perceptron (MLP) network; and updating the model parameters of the shared layer and the one or more task layers of the starting multi-task neural network comprises: updating, in the LSTM network, connection parameters from an input layer to a hidden layer, connection parameters from the hidden layer to an output layer, or connection parameters between the hidden layers of the LSTM network; and updating, in the each fully connected MLP network, connection parameters from an input layer to a hidden layer or connection parameters from the hidden layer to an output layer of the each fully connected MLP network. 6. An electronic device, comprising: at least one memory and at least one processor; the memory storing a program, the processor invoking the program stored by the memory, and the program being configured for: acquiring a sample speech and determining task input features each for one speech signal processing task among one or more speech signal processing tasks for the sample speech; establishing a starting multi-task neural network model comprising one or more task layers corresponding to the one or more speech signal processing tasks and a shared layer common to the one or more speech signal processing tasks; determining a target training loss function based on separate training loss functions each for one speech signal processing task among the one or more speech signal processing tasks; using the task input features of the one or more speech signal processing tasks as a training input of the starting multi-task neural network, and updating model parameters of the shared layer and the one or more task layers of the starting multi-task neural network by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model; determining, from the one or more speech signal processing tasks, one or more speech signal processing tasks having a training complexity higher than a preset complexity threshold as one or more first-class speech signal processing tasks, the one or more first-class speech signal processing tasks having corresponding first-class task input features and a first-class target training loss function; and using the first-class task input features of the one or more first-class speech signal processing tasks of the sample speech as a training input of an initial untrained multi-task neural network, and updating parameters of a first shared layer and first task layers corresponding to the one or more first-class speech signal processing tasks by minimizing the first-class target training loss function as a training objective, until the initial untrained multi-task neural network converges, to obtain the starting multi-task neural network. 7. The electronic device according to claim 6 , wherein the program is further configured for: for the shared layer, updating the model parameters of the shared layer based on the target training loss function by minimizing the target training loss function as the training objective; and for each of the one or more task layers corresponding to the one or more speech signal processing tasks, updating the model parameters of the one or more task layers based on the separate training loss function corresponding to the one or more speech signal processing tasks by minimiz
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
the noise being echo, reverberation of the speech · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.