Learning efficient object detection models with knowledge distillation
US-2018268292-A1 · Sep 20, 2018 · US
US11900260B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11900260-B2 |
| Application number | US-202016810524-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 5, 2020 |
| Priority date | Mar 5, 2020 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, devices and processor-readable media for an integrated teacher-student machine learning system. One or more teacher-student modules are trained as part of the teacher neural network training. Each student sub-network uses a portion of the teacher neural network to generate an intermediate feature map, then provides the intermediate feature map to a student sub-network to generate inferences. The student sub-network may use a feature enhancement block to map the intermediate feature map to a subsequent feature map. A compression block may be used to compress intermediate feature map data for transmission in some embodiments.
Opening claim text (preview).
The invention claimed is: 1. A method, comprising: operating a first teacher sub-network of a teacher neural network of an integrated system in an inference mode to generate a first intermediate feature map based on an input data received by the integrated system; and operating a first student sub-network of the integrated system in the inference mode to generate student inference data based on the first intermediate feature map, wherein: the first student sub-network has been trained when operating in a training mode, using subsequent inference data provided as a supervision signal, the subsequent inference data being generated by a second student sub-network of the integrated system that is trained using teacher inference data generated by the teacher neural network. 2. The method of claim 1 , further comprising: using a first processor to operate a prior teacher sub-network of the teacher neural network in the inference mode to: generate a prior intermediate feature map based on the inference-mode data input; and provide compressed data based on the prior intermediate feature map; and receiving the compressed data at a second processor, the second processor being used to operate the first teacher sub-network and the first student sub-network in the inference mode. 3. The method of claim 2 , wherein the compressed data is provided by: generating positional normalization data based on the prior intermediate feature map; down-sampling the prior intermediate feature map to generate an embedding vector; and generating the compressed data comprising the positional normalization data and the embedding vector. 4. The method of claim 1 , wherein the first teacher sub-network comprises one or more layers of the teacher neural network. 5. The method of claim 1 , wherein: the first student sub-network comprises a feature enhancement sub-network; and the feature enhancement sub-network has been trained, operating in a training mode, to perform a non-linear mapping between its input and output using penultimate feature map data provided by a subsequent teacher sub-network of the integrated system as a supervision signal. 6. The method of claim 5 , wherein: the first intermediate feature map comprises a first intermediate feature map matrix; and generating student inference data comprises: generating as output an output feature map matrix based on the first intermediate feature map matrix, the output feature map matrix having different matrix dimensions than the first intermediate feature map matrix; and generating student inference data based on the output feature map matrix. 7. The method of claim 6 , wherein the feature enhancement sub-network generates the output feature map matrix by applying at least one convolution operation, at least one down-sampling operation, and at least one concatenation operation to the first intermediate feature map matrix. 8. A method, comprising: receiving a feature map; generating positional normalization data based on the feature map; down-sampling the feature map to generate an embedding vector; generating compressed data comprising the positional normalization data and the embedding vector; and transmitting the compressed data over a communication link. 9. The method of claim 8 , wherein the feature map is generated by a first processor operating a prior sub-network of a neural network; and further comprising: receiving the compressed data over the communication link at a second processor; and using the second processor to operate a first sub-network of the neural network to generate inference data based on the compressed data. 10. The method of claim 9 , wherein: the prior sub-network of the neural network comprises a first layer of the neural network; and the first sub-network of the neural network comprises a second layer of the neural network. 11. A method, comprising: providing an integrated system comprising: a teacher neural network comprising, in series: a first teacher sub-network adapted to generate a first intermediate feature map based on a training-mode data input to the teacher neural network; and a final teacher sub-network adapted to generate teacher inference data based on the first intermediate feature map; and a first student sub-network adapted to generate first student inference data based on the first intermediate feature map; and training the integrated system by: propagating the training-mode data input forward through the teacher neural network to generate the first intermediate feature map and teacher inference data; propagating the first intermediate feature map forward through the first student sub-network to generate first student inference data; calculating a first knowledge distillation loss based on a knowledge distillation loss function applied to the first student inference data and the teacher inference data; and propagating the first knowledge distillation loss backward through the first student sub-network and the first teacher sub-network to train the first student sub-network and the first teacher sub-network. 12. The method of claim 11 , wherein: the teacher neural network further comprises, in series prior to the first teacher sub-network, a prior teacher sub-network adapted to generate a prior intermediate feature map based on the training-mode data input; the integrated system further comprises a prior student sub-network adapted to generate prior student inference data based on the prior intermediate feature map; and training the integrated system further comprises: propagating the prior intermediate feature map forward through the prior student sub-network to generate prior student inference data; calculating a prior knowledge distillation loss based on a knowledge distillation loss function applied to the prior student inference data and the first student inference data; and propagating the prior knowledge distillation loss backward through the prior student sub-network and the prior teacher sub-network to train the prior student sub-network and the prior teacher sub-network. 13. The method of claim 11 , further comprising, after the teacher neural network and the first student sub-network have been trained, jointly operating the first teacher sub-network and the first student sub-network in an inference mode to perform an inference task. 14. The method of claim 11 , wherein the final teacher sub-network is further adapted to generate a penultimate feature map based on the first intermediate feature map, the method further comprising: providing a first feature enhancement sub-network as part of the first student sub-network; and training the first feature enhancement sub-network by: propagating the first intermediate feature map forward through the first feature enhancement sub-network to generate a first student feature map; calculating a feature enhancement loss based on the penultimate feature map compared to the first student feature map; and propagating the feature enhancement loss backward through the first feature enhancement sub-network to train the first feature enhancement sub-network. 15. The method of claim 11 , further comprising, before training the integrated system, pre-training the teacher neural network. 16. A device, comprising: a processor; and a memory having stored thereon instructions for carrying out the steps of the method of claim 1 . 17. A device, comprising: a processor; a communication link; and a memory having stored thereon instructions for carrying out the steps of the metho
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Supervised learning · CPC title
Distributed learning, e.g. federated learning · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.