Computing device and method

US11740898B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11740898-B2
Application numberUS-201916714974-A
CountryUS
Kind codeB2
Filing dateDec 16, 2019
Priority dateFeb 13, 2018
Publication dateAug 29, 2023
Grant dateAug 29, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A computation device, comprising: an operation device, a controller circuit, and a conversion circuit; wherein the controller circuit is configured to: obtain a configuration instruction before the operation device performs operations, wherein an opcode field of the configuration instruction comprises a decimal point position and a data type of data involved in the operations; and parse the configuration instruction to obtain the decimal point position and the data type of the data involved in the operations; wherein the controller circuit is further configured to obtain input data and determine whether the data type of the input data is consistent with that of the data involved in the operations; when it is determined that the data type of the input data is inconsistent with that of the data involved in the operations, the controller circuit transmits the input data, the decimal point position, and the data type of the data involved in the operations to the conversion circuit; and wherein the conversion circuit is configured to perform data type conversion on the input data according to the decimal point position and the data type of the data involved in the operations to obtain converted input data, wherein the data type of the converted input data is consistent with that of the data involved in the operations. 2. The computation device of claim 1 , wherein obtaining the configuration instruction by the controller circuit before the operation device performs the operations refers to obtaining the configuration instruction before the operation device performs the operations on data of an i th layer of a multi-layer neural network. 3. The computation device of claim 2 , wherein: the computation device is configured to execute a machine learning computation, the controller circuit is further configured to transmit the converted input data to the operation device, and when the data type of the input data is consistent with that of the data involved in the operations, the controller circuit transmits the input data to the operation device, and the operation device is configured to perform the operations on the converted input data or the input data to obtain an operation result. 4. The computation device of claim 3 , wherein: the machine learning computation includes an artificial neural network operation, the first input data includes an input neuron and a weight, and the computation result is an output neuron. 5. The computation device of claim 3 , wherein the operation device includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit, the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction. 6. The computation device of claim 5 , further comprising a storage unit and a direct memory access (DMA) unit, wherein: the storage unit includes any combination of a register and a cache, the cache includes a scratch pad cache and is configured to store the first input data, and the register is configured to store scalar data in the first input data. 7. The computation device of claim 5 , wherein the operation device includes a tree module circuit; wherein: the tree module circuit includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and the tree module circuit is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits; and wherein the tree module circuit is an n-tree structure, wherein the n is an integer greater than or equal to two. 8. The computation device of claim 7 , wherein: the primary processing circuit is configured to perform a combined ranking processing on the plurality of intermediate results received from the plurality of processing circuits to obtain a computation result of the computation instruction, or the primary processing circuit is configured to a combined ranking processing and an activation processing on the plurality of intermediate results received from the plurality of processing circuits to obtain the computation result of the computation instruction. 9. The computation device of claim 7 , wherein the primary processing circuit includes one or any combination of an activation processing circuit and an addition processing circuit, wherein: the activation processing circuit is configured to perform an activation operation on data in the primary processing circuit, and the addition processing circuit is configured to perform an addition operation or an accumulation operation, the plurality of secondary processing circuit includes: a multiplication processing circuit configured to perform a multiplication operation on the data blocks received to obtain a product result, and an accumulation processing circuit configured to perform an accumulation operation on the product results to obtain the plurality of intermediate results. 10. The computation device of claim 5 , wherein the operation device further includes a branch processing circuit, wherein: the primary processing circuit is configured to: determine that the input neurons are broadcast data and the weights are distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuit, the branch processing circuit is configured to forward the data blocks, the broadcast data, and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the branch processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuit to obtain a computation result of the computation instruction, and to send the computation result of the computation instruction to the controller circuit. 11. The computation device of claim 5 , wherein the plurality of secondary processing circuits is distributed in an array, wherein: each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits, the K secondary processing circuits include n secondary processing circuits in the first row, n secondary processing circuits in the m th row, and m secondary processing

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11740898B2 cover?
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit t…
Who is the assignee on this patent?
Shanghai Cambricon Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 29 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).