Who is the assignee on this patent?

Shanghai Cambricon Inf Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network quantization parameter determination method and related products

US12093148B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12093148-B2
Application number	US-202117547972-A
Country	US
Kind code	B2
Filing date	Dec 10, 2021
Priority date	Jun 12, 2019
Publication date	Sep 17, 2024
Grant date	Sep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technical solution involves a board card including a storage component, an interface apparatus, a control component, and an artificial intelligence chip. The artificial intelligence chip is connected to the storage component, the control component, and the interface apparatus, respectively; the storage component is used to store data; the interface apparatus is used to implement data transfer between the artificial intelligence chip and an external device; and the control component is used to monitor a state of the artificial intelligence chip. The board card is used to perform an artificial intelligence operation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for adjusting a data bit width in a convolution neural network layer during a neural network computation, comprising: obtaining a data bit width used to perform a quantization on data to be quantized, wherein the data to be quantized includes at least one type of neurons, weights, gradients, or biases, the data bit width indicates the data bit width of the quantized data after the data to be quantized being quantized; performing a quantization on a group of data to be quantized based on the data bit width to convert the group of data to be quantized to a group of quantized data, wherein the group of quantized data has the data bit width; comparing the group of data to be quantized with the group of quantized data to determine a quantization error correlated with the data bit width; adjusting the data bit width based on the determined quantization error; and applying the adjusted data bit width during quantization in the convolution neural network layer. 2. The method of claim 1 , wherein the comparing of the group of data to be quantized with the group of quantized data to determine the quantization error correlated with the data bit width includes: determining a quantization interval according to the data bit width; and determining the quantization error according to the quantization interval, the group of the quantized data and the group of data to be quantized. 3. The method of claim 2 , wherein the determining the quantization error according to the quantization interval, the group of the quantized data and the group of the data to be quantized includes: inversely quantizing the group of quantized data according to the quantization interval to obtain a group of inversely quantized data, wherein a data format of the group of inversely quantized data is the same with a data format of the group of the data to be quantized; and determining a quantization error according to the group of inversely quantized data and the group of data to be quantized. 4. The method of claim 1 , wherein the adjusting the data bit width based on the determined quantization error includes: comparing the quantization error and a preset threshold, wherein the preset threshold includes at least one of a first threshold and a second threshold; and adjusting the data bit width according to a comparison result. 5. The method of claim 4 , wherein the adjusting the data bit width according to the comparison result includes: increasing the data bit width when the quantization error is greater than or equal to the first threshold; wherein the increasing the data bit width includes: increasing the data bit width according to a first preset bit width stride to determine an adjusted data bit width; wherein the method further comprises: iteratively performing the quantization on the group of data to be quantized based on the adjusted data bit width to convert the group of data to be quantized to another group of quantized data, wherein the other group of quantized data has the adjusted data bit width; and comparing the group of data to be quantized with the other group of quantized data to determine another quantization error correlated with an adjusted data bit width until the other quantization error is less than the first preset threshold. 6. The method of claim 4 , wherein the adjusting the data bit width according to the comparison result includes: decreasing the data bit width when the quantization error is less than or equal to a second threshold; wherein the decreasing the data bit width includes: decreasing the data bit width according to a second preset bit width stride to determine an adjusted bit width; wherein the method further comprises: iteratively performing the quantization on the group of data to be quantized based on the adjusted data bit width to convert the group of data to be quantized to another group of quantized data, wherein the other group of quantized data has the adjusted data bit width; and determining another quantization error correlated with the adjusted data bit width based on the group of data to be quantized and the other group of quantized data, until the other quantization error is greater than the second preset threshold. 7. The method of claim 4 , wherein the adjusting the data bit width according to the comparison result includes: maintaining the data bit width when the quantization error is between the first threshold and the second threshold. 8. The method of claim 1 , further comprising: updating a quantization parameter configured to perform the quantization on the group of data to be quantized based on the group of data to be quantized and the adjusted bit width; and performing the quantization on the group of data to be quantized based on an updated quantization parameter. 9. The method of claim 1 , further comprising: obtaining a data variation range of data to be quantized; and according to the data variation range of the data to be quantized, determining a target iteration interval to adjust the data bit width according to the target iteration interval, wherein the target iteration interval includes at least one iteration. 10. The method of claim 9 , wherein the determining the target iteration interval according to the data variation range of the data to be quantized includes: determining the target iteration interval according to the first error, wherein the target iteration interval is negatively correlated with the first error. 11. The method of claim 9 , wherein the obtaining of the data variation range of the data to be quantized includes: obtaining a variation trend of the data bit width; and determining the data variation range of the data to be quantized according to a variation range of a point location and the variation trend of the data bit width. 12. A device for adjusting a data bit width in a convolution neural network layer during a neural network computation, comprising: an obtaining circuit configured to obtain a data bit width used to perform a quantization on data to be quantized, wherein the data to be quantized includes at least one type of neurons, weights, gradients, or biases, the data bit width indicates the data bit width of the quantized data after the data to be quantized being quantized; a quantization circuit configured to perform a quantization on a group of data to be quantized based on the data bit width to convert the group of data to be quantized to a group of quantized data, wherein the group of quantized data has the data bit width; and a determination circuit configured to compare the group of data to be quantized with the group of quantized data to determine a quantization error correlated with the data bit width, and adjust the data bit width based on the determined quantization error, before the adjusted data bit width is applied during quantization in the convolution neural network layer. 13. An artificial intelligence chip comprising the device of claim 12 . 14. A non-transitory computer readable storage medium, wherein a computer program is stored in the non-transitory computer readable storage medium, and the method of claim 1 are implemented when the computer program is executed by a processor. 15. An electronic device comprising the artificial intelligence chip of claim 13 .

Assignees

Shanghai Cambricon Inf Tech Co Ltd

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N5/02
Knowledge representation; Symbolic representation · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 69185300

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12093148B2 cover?: The technical solution involves a board card including a storage component, an interface apparatus, a control component, and an artificial intelligence chip. The artificial intelligence chip is connected to the storage component, the control component, and the interface apparatus, respectively; the storage component is used to store data; the interface apparatus is used to implement data transf…
Who is the assignee on this patent?: Shanghai Cambricon Inf Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).