Computing device and method

US11620130B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620130-B2
Application numberUS-201916715009-A
CountryUS
Kind codeB2
Filing dateDec 16, 2019
Priority dateFeb 13, 2018
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A computation device, comprising: a storage unit, a conversion unit, an operation unit, and a controller unit; wherein the storage unit comprises a cache and a register; wherein: the controller unit is configured to determine a decimal point position of first input data and a bit width of fixed-point data, wherein the bit width of the fixed-point data is the bit width of the first input data converted into the fixed-point data; the operation unit is configured to initialize the decimal point position of the first input data and adjust the decimal point position of the first input data; and store the adjusted decimal point position of the first input data in the cache of the storage unit; the controller unit is configured to obtain the first input data and a plurality of operation instructions from the register, and obtain the adjusted decimal point position of the first input data and the first input data from the cache; and transmit the adjusted decimal point position of the first input data and the first input data to the conversion unit; the conversion unit is configured to convert the first input data into a second input data according to the adjusted decimal point position of the first input data; and wherein the initializing the decimal point position of the first input data by the operation unit includes: initializing the decimal point position of the first input data according to a maximum absolute value of the first input data. 2. The computation device of claim 1 , wherein initializing the decimal point position of the first input data according to the maximum absolute value of the first input data by the operation unit includes: the operation unit initializes the decimal point position of the first input data according to a first preset formula and the maximum absolute value of the first input data, wherein the preset first formula is s a =┌log 2 a max −bitnum+1┐, the s a represents the initialized decimal point position of the first input data, the a max represents the maximum absolute value of the first input data, and the bitnum represents the bit width of the fixed-point data converted from the first input data. 3. The computation device of claim 1 , wherein adjusting the decimal point position of the first input data by the operation unit includes: adjusting the decimal point position of the first input data upwardly by a single step according to the maximum absolute value of the first input data, or adjusting the decimal point position of the first input data upwardly step by step according to the maximum absolute value of the first input data, or adjusting the decimal point position of the first input data upwardly by a single step according to the first input data distribution, or adjusting the decimal point position of the first input data upwardly step by step according to the first input data distribution, or adjusting the decimal point position of the first input data downwardly according to the absolute value of the first input data. 4. The computation device of claim 1 , wherein the computation device is configured to execute a machine learning computation, and wherein: the controller unit is further configured to transmit the plurality of operation instructions to the operation unit, the conversion unit is further configured to transmit the second input data to the operation unit, and the operation unit is further configured to perform operations on the second input data according to the plurality of operation instructions to obtain an operation result. 5. The computation device of claim 4 , wherein the machine learning computation includes an artificial neural network operation, the first input data includes an input neuron and a weight, and the computation result is an output neuron. 6. The computation device of claim 4 , wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, wherein: the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit, the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction. 7. The computation device of claim 6 , further comprising a storage unit and a direct memory access (DMA) unit, wherein: the storage unit includes any combination of a register and a cache, wherein the cache includes a scratch pad cache and is configured to store the first input data, and the register is configured to store scalar data in the first input data, and the DMA unit is configured to read data from the storage unit or store data in the storage unit. 8. The computation device of claim 1 , wherein when the first input data is fixed-point data, the operation unit further includes: a derivation unit configured to derive a decimal point position of at least one intermediate result according to the decimal point position of the first input data, wherein the at least one intermediate result is obtained by operating according to the first input data. 9. The computation device of claim 8 , wherein the operation unit further includes: a data cache unit configured to cache the at least one intermediate result. 10. The computation device of claim 4 , wherein the operation unit includes a tree module; wherein: the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and the tree module is configured to forward data and the plurality of operation instructions transmitted among the primary processing circuit and the plurality of secondary processing circuits; and wherein the tree module is an n-tree structure, the n being an integer greater than or equal to two. 11. The computation device of claim 4 , wherein the operation unit further includes a branch processing circuit; wherein: the primary processing circuit is configured to: determine that the input neurons are broadcast data and the weights are distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuit; and wherein the branch processing circuit is configured to forward the data blocks, the broadcast data, and the plurality of operation instructions transmitted among the primary processing circuit and the plurality of secondary processing circuits, the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the branch processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuit to obtain a computation result of the computation instruction, and to send t

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Computations with decimal numbers {radix 12 or 20. (G06F7/4824 takes precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620130B2 cover?
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit t…
Who is the assignee on this patent?
Shanghai Cambricon Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F13/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).