Training teacher machine learning models using lossless and lossy branches

US11907845B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11907845-B2
Application numberUS-202016994656-A
CountryUS
Kind codeB2
Filing dateAug 17, 2020
Priority dateAug 17, 2020
Publication dateFeb 20, 2024
Grant dateFeb 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label; training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 2. The CIM of claim 1 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 3. The CIM of claim 2 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 4. The CIM of claim 3 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 5. The CIM of claim 1 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 6. The CIM of claim 1 , further comprising: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 7. A computer program product (CPP) comprising: a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing a processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 8. The CPP of claim 7 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 9. The CPP of claim 8 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 10. The CPP of claim 9 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 11. The CPP of claim 7 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 12. The CPP of claim 7 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 13. A computer system (CS) comprising: a processor set; a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing the processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 14. The CS of claim 13 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 15. The CS of claim 14 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 16. The CS of claim 15 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 17. The CS of claim 13 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 18. The CS of claim 13 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generate

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Transfer learning · CPC title

  • Supervised learning · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11907845B2 cover?
Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lo…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).