Neural network method and apparatus
US-2019340504-A1 · Nov 7, 2019 · US
US12333428B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12333428-B2 |
| Application number | US-201917434563-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 27, 2019 |
| Priority date | Feb 27, 2019 |
| Publication date | Jun 17, 2025 |
| Grant date | Jun 17, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A neural network model processing method includes obtaining a first low-bit neural network model through training, where the model includes a first operation layer and a second operation layer. Each operation layer includes at least one operation. Values/a value of a parameter and/or data used for the operation are/is represented by using N bits, and N is a positive integer less than 8. The neural network model processing method further includes compressing the model to obtain a second low-bit neural network model, where the compressed model includes a third operation layer. The third operation layer is equivalent to the first operation layer and the second operation layer, and an operation layer other than the third operation layer in the at least one operation layer is the same as an operation layer other than the first operation layer and the second operation layer in the at least two operation layers.
Opening claim text (preview).
What is claimed is: 1. A neural network model processing method, the neural network model processing method comprising: obtaining a first low-bit neural network model through training, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of one or more of a parameter or data used for the at least one operation are represented using N bits, and wherein N is a positive integer less than eight; compressing the first low-bit neural network model to obtain a second low-bit neural network model, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the first operation layer and the second operation layer, and wherein an operation layer other than the third operation layer in the at least two operation layers is the same as an operation layer other than the first operation layer and the second operation layer in the at least three operation layers; searching the at least three operation layers for the first operation layer and the second operation layer; combining the first operation layer and the second operation layer to obtain the third operation layer, wherein an input of the first operation layer is the same as an input of the third operation layer, wherein an output of the first operation layer is an input of the second operation layer, and wherein an output of the second operation layer is the same as an output of the third operation layer; and constructing the second low-bit neural network model based on the third operation layer and the operation layer other than the first operation layer and the second operation layer in the at least three operation layers. 2. The neural network model processing method of claim 1 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the neural network model processing method further comprises: combining the at least one first operation and the at least one second operation according to a preset rule to obtain at least one third operation; and constructing the third operation layer based on the at least one third operation, wherein the third operation layer comprises the at least one third operation. 3. The neural network model processing method of claim 1 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the neural network model processing method further comprises constructing the third operation layer based on the at least one first operation and the at least one second operation, wherein the third operation layer comprises the at least one first operation and the at least one second operation. 4. The neural network model processing method of claim 1 , further comprising storing the second low-bit neural network model. 5. The neural network model processing method of claim 1 , further comprising sending the second low-bit neural network model to a terminal device. 6. The neural network model processing method of claim 1 , wherein the neural network model processing method is performed by a terminal device. 7. The neural network model processing method of claim 1 , wherein the neural network model processing method is performed by a server. 8. A neural network model processing method performed by a terminal device, the neural network model processing method comprising: obtaining a second low-bit neural network model; and updating a first low-bit neural network model to the second low-bit neural network model, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of a parameter or data used for the at least one operation are represented using N bits, wherein N is a positive integer less than eight, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the first operation layer and the second operation layer, wherein an operation layer other than the third operation layer in the at least two operation layers is the same as an operation layer other than the first operation layer and the second operation layer in the at least three operation layers, wherein the second low-bit neural network model is based on the third operation layer and the operation layer other than the first operation layer and the second operation layer in the at least three operation layers, wherein the third operation layer is an operation layer obtained by combining the first operation layer and the second operation layer, wherein an input of the first operation layer is the same as an input of the third operation layer, wherein an output of the first operation layer is an input of the second operation layer, and wherein an output of the second operation layer is the same as an output of the third operation layer. 9. The neural network model processing method of claim 8 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, wherein the third operation layer comprises at least one third operation, and wherein the at least one third operation is obtained by combining the at least one first operation and the at least one second operation according to a preset rule. 10. The neural network model processing method of claim 8 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the third operation layer comprises the at least one first operation and the at least one second operation. 11. The neural network model processing method of claim 8 , further comprising receiving the second low-bit neural network model from a server. 12. The neural network model processing method of claim 8 , further comprising locally obtaining the second low-bit neural network model. 13. A neural network model processing system, comprising: a terminal device; and a server communicatively coupled to the terminal device and configured to: obtain a first low-bit neural network model through training, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of one or more of a parameter or data used for the at least one operation are represented using N bits, and wherein N is a positive integer less than eight; compress the first low-bit neural network model to obtain a second low-bit neural network model, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the f
Related publications grouped by family.
Answers are generated from the same data shown on this page.