Training of student neural network with teacher neural networks
US-2020034703-A1 · Jan 30, 2020 · US
US11907845B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11907845-B2 |
| Application number | US-202016994656-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 17, 2020 |
| Priority date | Aug 17, 2020 |
| Publication date | Feb 20, 2024 |
| Grant date | Feb 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label; training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 2. The CIM of claim 1 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 3. The CIM of claim 2 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 4. The CIM of claim 3 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 5. The CIM of claim 1 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 6. The CIM of claim 1 , further comprising: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 7. A computer program product (CPP) comprising: a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing a processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 8. The CPP of claim 7 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 9. The CPP of claim 8 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 10. The CPP of claim 9 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 11. The CPP of claim 7 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 12. The CPP of claim 7 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 13. A computer system (CS) comprising: a processor set; a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing the processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 14. The CS of claim 13 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 15. The CS of claim 14 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 16. The CS of claim 15 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 17. The CS of claim 13 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 18. The CS of claim 13 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generate
Convolutional networks [CNN, ConvNet] · CPC title
Transfer learning · CPC title
Supervised learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.