Methods, devices and media providing an integrated teacher-student system

US11900260B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900260-B2
Application numberUS-202016810524-A
CountryUS
Kind codeB2
Filing dateMar 5, 2020
Priority dateMar 5, 2020
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, devices and processor-readable media for an integrated teacher-student machine learning system. One or more teacher-student modules are trained as part of the teacher neural network training. Each student sub-network uses a portion of the teacher neural network to generate an intermediate feature map, then provides the intermediate feature map to a student sub-network to generate inferences. The student sub-network may use a feature enhancement block to map the intermediate feature map to a subsequent feature map. A compression block may be used to compress intermediate feature map data for transmission in some embodiments.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: operating a first teacher sub-network of a teacher neural network of an integrated system in an inference mode to generate a first intermediate feature map based on an input data received by the integrated system; and operating a first student sub-network of the integrated system in the inference mode to generate student inference data based on the first intermediate feature map, wherein: the first student sub-network has been trained when operating in a training mode, using subsequent inference data provided as a supervision signal, the subsequent inference data being generated by a second student sub-network of the integrated system that is trained using teacher inference data generated by the teacher neural network. 2. The method of claim 1 , further comprising: using a first processor to operate a prior teacher sub-network of the teacher neural network in the inference mode to: generate a prior intermediate feature map based on the inference-mode data input; and provide compressed data based on the prior intermediate feature map; and receiving the compressed data at a second processor, the second processor being used to operate the first teacher sub-network and the first student sub-network in the inference mode. 3. The method of claim 2 , wherein the compressed data is provided by: generating positional normalization data based on the prior intermediate feature map; down-sampling the prior intermediate feature map to generate an embedding vector; and generating the compressed data comprising the positional normalization data and the embedding vector. 4. The method of claim 1 , wherein the first teacher sub-network comprises one or more layers of the teacher neural network. 5. The method of claim 1 , wherein: the first student sub-network comprises a feature enhancement sub-network; and the feature enhancement sub-network has been trained, operating in a training mode, to perform a non-linear mapping between its input and output using penultimate feature map data provided by a subsequent teacher sub-network of the integrated system as a supervision signal. 6. The method of claim 5 , wherein: the first intermediate feature map comprises a first intermediate feature map matrix; and generating student inference data comprises: generating as output an output feature map matrix based on the first intermediate feature map matrix, the output feature map matrix having different matrix dimensions than the first intermediate feature map matrix; and generating student inference data based on the output feature map matrix. 7. The method of claim 6 , wherein the feature enhancement sub-network generates the output feature map matrix by applying at least one convolution operation, at least one down-sampling operation, and at least one concatenation operation to the first intermediate feature map matrix. 8. A method, comprising: receiving a feature map; generating positional normalization data based on the feature map; down-sampling the feature map to generate an embedding vector; generating compressed data comprising the positional normalization data and the embedding vector; and transmitting the compressed data over a communication link. 9. The method of claim 8 , wherein the feature map is generated by a first processor operating a prior sub-network of a neural network; and further comprising: receiving the compressed data over the communication link at a second processor; and using the second processor to operate a first sub-network of the neural network to generate inference data based on the compressed data. 10. The method of claim 9 , wherein: the prior sub-network of the neural network comprises a first layer of the neural network; and the first sub-network of the neural network comprises a second layer of the neural network. 11. A method, comprising: providing an integrated system comprising: a teacher neural network comprising, in series: a first teacher sub-network adapted to generate a first intermediate feature map based on a training-mode data input to the teacher neural network; and a final teacher sub-network adapted to generate teacher inference data based on the first intermediate feature map; and a first student sub-network adapted to generate first student inference data based on the first intermediate feature map; and training the integrated system by: propagating the training-mode data input forward through the teacher neural network to generate the first intermediate feature map and teacher inference data; propagating the first intermediate feature map forward through the first student sub-network to generate first student inference data; calculating a first knowledge distillation loss based on a knowledge distillation loss function applied to the first student inference data and the teacher inference data; and propagating the first knowledge distillation loss backward through the first student sub-network and the first teacher sub-network to train the first student sub-network and the first teacher sub-network. 12. The method of claim 11 , wherein: the teacher neural network further comprises, in series prior to the first teacher sub-network, a prior teacher sub-network adapted to generate a prior intermediate feature map based on the training-mode data input; the integrated system further comprises a prior student sub-network adapted to generate prior student inference data based on the prior intermediate feature map; and training the integrated system further comprises: propagating the prior intermediate feature map forward through the prior student sub-network to generate prior student inference data; calculating a prior knowledge distillation loss based on a knowledge distillation loss function applied to the prior student inference data and the first student inference data; and propagating the prior knowledge distillation loss backward through the prior student sub-network and the prior teacher sub-network to train the prior student sub-network and the prior teacher sub-network. 13. The method of claim 11 , further comprising, after the teacher neural network and the first student sub-network have been trained, jointly operating the first teacher sub-network and the first student sub-network in an inference mode to perform an inference task. 14. The method of claim 11 , wherein the final teacher sub-network is further adapted to generate a penultimate feature map based on the first intermediate feature map, the method further comprising: providing a first feature enhancement sub-network as part of the first student sub-network; and training the first feature enhancement sub-network by: propagating the first intermediate feature map forward through the first feature enhancement sub-network to generate a first student feature map; calculating a feature enhancement loss based on the penultimate feature map compared to the first student feature map; and propagating the feature enhancement loss backward through the first feature enhancement sub-network to train the first feature enhancement sub-network. 15. The method of claim 11 , further comprising, before training the integrated system, pre-training the teacher neural network. 16. A device, comprising: a processor; and a memory having stored thereon instructions for carrying out the steps of the method of claim 1 . 17. A device, comprising: a processor; a communication link; and a memory having stored thereon instructions for carrying out the steps of the metho

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Supervised learning · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • G06N3/088Primary

    Non-supervised learning, e.g. competitive learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900260B2 cover?
Methods, devices and processor-readable media for an integrated teacher-student machine learning system. One or more teacher-student modules are trained as part of the teacher neural network training. Each student sub-network uses a portion of the teacher neural network to generate an intermediate feature map, then provides the intermediate feature map to a student sub-network to generate infer…
Who is the assignee on this patent?
Sridhar Deepak, Lu Juwei, Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).