What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training teacher machine learning models using lossless and lossy branches

US11907845B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11907845-B2
Application number	US-202016994656-A
Country	US
Kind code	B2
Filing date	Aug 17, 2020
Priority date	Aug 17, 2020
Publication date	Feb 20, 2024
Grant date	Feb 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label; training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 2. The CIM of claim 1 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 3. The CIM of claim 2 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 4. The CIM of claim 3 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 5. The CIM of claim 1 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 6. The CIM of claim 1 , further comprising: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 7. A computer program product (CPP) comprising: a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing a processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 8. The CPP of claim 7 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 9. The CPP of claim 8 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 10. The CPP of claim 9 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 11. The CPP of claim 7 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 12. The CPP of claim 7 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generated set of soft-targets. 13. A computer system (CS) comprising: a processor set; a machine readable storage medium; and computer code stored on the machine readable storage medium, with the computer code including instructions for causing the processor set to perform operations including the following: receiving a training data set, with the training data set including at least one lossy audio recording, at least one lossless audio recording and a corresponding label, training a teacher neural network with at least a first branch and a second branch, wherein training the teacher neural network comprises: determining a first branch output using the first branch to process the at least one lossless audio recording; determining a second branch output using the second branch to process the at least one lossy audio recording; quantifying a first loss value as a difference between the corresponding label and the first branch output; quantifying a second loss value as a difference between the second branch output and the first branch output; and modifying weights for the first branch and the second branch based on a sum of the first loss value and the second loss value; and responsive to receiving a test input data set including a lossy audio recording, generating a set of soft-targets for training a student network based, at least in part, on the second branch and the test input dataset. 14. The CS of claim 13 , wherein the first branch and the second branch of the teacher neural network are structured as a twofold Siamese network. 15. The CS of claim 14 , wherein the twofold Siamese network further includes weights for the first branch and the second branch. 16. The CS of claim 15 , wherein: the weights for the first branch and the second branch are determined based, at least in part, on outputs from the first branch and the second branch according to a backpropagation technique; and the weights for the first branch and the second branch are tied together. 17. The CS of claim 13 , wherein: the first branch of the teacher neural network is further trained using privileged knowledge that is isolated from the second branch of the teacher neural network; and the set of soft-targets are generated by the second branch of the teacher neural network isolated from the first branch of the teacher neural network. 18. The CS of claim 13 , wherein the computer code further includes instructions for causing the processor(s) set to perform the following operations: initializing a student neural network for identifying features in audio recordings of human speech based, at least in part, on the generate

Assignees

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/096
Transfer learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 80222966

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11907845B2 cover?: Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lo…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Training of student neural network with teacher neural networks

Training of student neural network with switched teacher neural networks

Teacher and student learning for constructing mixed-domain model

Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition

Apparatus and method for student-teacher transfer learning network using knowledge bridge

Frequently asked questions