What technology area does this patent fall under?

Primary CPC classification G06N3/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network model processing method and apparatus

US12333428B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12333428-B2
Application number	US-201917434563-A
Country	US
Kind code	B2
Filing date	Feb 27, 2019
Priority date	Feb 27, 2019
Publication date	Jun 17, 2025
Grant date	Jun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network model processing method includes obtaining a first low-bit neural network model through training, where the model includes a first operation layer and a second operation layer. Each operation layer includes at least one operation. Values/a value of a parameter and/or data used for the operation are/is represented by using N bits, and N is a positive integer less than 8. The neural network model processing method further includes compressing the model to obtain a second low-bit neural network model, where the compressed model includes a third operation layer. The third operation layer is equivalent to the first operation layer and the second operation layer, and an operation layer other than the third operation layer in the at least one operation layer is the same as an operation layer other than the first operation layer and the second operation layer in the at least two operation layers.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network model processing method, the neural network model processing method comprising: obtaining a first low-bit neural network model through training, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of one or more of a parameter or data used for the at least one operation are represented using N bits, and wherein N is a positive integer less than eight; compressing the first low-bit neural network model to obtain a second low-bit neural network model, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the first operation layer and the second operation layer, and wherein an operation layer other than the third operation layer in the at least two operation layers is the same as an operation layer other than the first operation layer and the second operation layer in the at least three operation layers; searching the at least three operation layers for the first operation layer and the second operation layer; combining the first operation layer and the second operation layer to obtain the third operation layer, wherein an input of the first operation layer is the same as an input of the third operation layer, wherein an output of the first operation layer is an input of the second operation layer, and wherein an output of the second operation layer is the same as an output of the third operation layer; and constructing the second low-bit neural network model based on the third operation layer and the operation layer other than the first operation layer and the second operation layer in the at least three operation layers. 2. The neural network model processing method of claim 1 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the neural network model processing method further comprises: combining the at least one first operation and the at least one second operation according to a preset rule to obtain at least one third operation; and constructing the third operation layer based on the at least one third operation, wherein the third operation layer comprises the at least one third operation. 3. The neural network model processing method of claim 1 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the neural network model processing method further comprises constructing the third operation layer based on the at least one first operation and the at least one second operation, wherein the third operation layer comprises the at least one first operation and the at least one second operation. 4. The neural network model processing method of claim 1 , further comprising storing the second low-bit neural network model. 5. The neural network model processing method of claim 1 , further comprising sending the second low-bit neural network model to a terminal device. 6. The neural network model processing method of claim 1 , wherein the neural network model processing method is performed by a terminal device. 7. The neural network model processing method of claim 1 , wherein the neural network model processing method is performed by a server. 8. A neural network model processing method performed by a terminal device, the neural network model processing method comprising: obtaining a second low-bit neural network model; and updating a first low-bit neural network model to the second low-bit neural network model, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of a parameter or data used for the at least one operation are represented using N bits, wherein N is a positive integer less than eight, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the first operation layer and the second operation layer, wherein an operation layer other than the third operation layer in the at least two operation layers is the same as an operation layer other than the first operation layer and the second operation layer in the at least three operation layers, wherein the second low-bit neural network model is based on the third operation layer and the operation layer other than the first operation layer and the second operation layer in the at least three operation layers, wherein the third operation layer is an operation layer obtained by combining the first operation layer and the second operation layer, wherein an input of the first operation layer is the same as an input of the third operation layer, wherein an output of the first operation layer is an input of the second operation layer, and wherein an output of the second operation layer is the same as an output of the third operation layer. 9. The neural network model processing method of claim 8 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, wherein the third operation layer comprises at least one third operation, and wherein the at least one third operation is obtained by combining the at least one first operation and the at least one second operation according to a preset rule. 10. The neural network model processing method of claim 8 , wherein the first operation layer comprises at least one first operation, wherein the second operation layer comprises at least one second operation, and wherein the third operation layer comprises the at least one first operation and the at least one second operation. 11. The neural network model processing method of claim 8 , further comprising receiving the second low-bit neural network model from a server. 12. The neural network model processing method of claim 8 , further comprising locally obtaining the second low-bit neural network model. 13. A neural network model processing system, comprising: a terminal device; and a server communicatively coupled to the terminal device and configured to: obtain a first low-bit neural network model through training, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of one or more of a parameter or data used for the at least one operation are represented using N bits, and wherein N is a positive integer less than eight; compress the first low-bit neural network model to obtain a second low-bit neural network model, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the f

Assignees

Huawei Tech Co Ltd

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/04Primary
Architecture, e.g. interconnection topology · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/08Primary
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 72238781

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333428B2 cover?: A neural network model processing method includes obtaining a first low-bit neural network model through training, where the model includes a first operation layer and a second operation layer. Each operation layer includes at least one operation. Values/a value of a parameter and/or data used for the operation are/is represented by using N bits, and N is a positive integer less than 8. The neu…
Who is the assignee on this patent?: Huawei Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural network method and apparatus

Method and apparatus for optimizing model applicable to pattern recognition, and terminal device

Method and apparatus for compressing neural network

Neural network accelerator with parameters resident on chip

Techniques for general-purpose lossless data compression using a recurrent neural network

Model compression and fine-tuning

Frequently asked questions