Systems and methods for deep learning processor
US-2017316312-A1 · Nov 2, 2017 · US
US11740898B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11740898-B2 |
| Application number | US-201916714974-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2019 |
| Priority date | Feb 13, 2018 |
| Publication date | Aug 29, 2023 |
| Grant date | Aug 29, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
Opening claim text (preview).
What is claimed is: 1. A computation device, comprising: an operation device, a controller circuit, and a conversion circuit; wherein the controller circuit is configured to: obtain a configuration instruction before the operation device performs operations, wherein an opcode field of the configuration instruction comprises a decimal point position and a data type of data involved in the operations; and parse the configuration instruction to obtain the decimal point position and the data type of the data involved in the operations; wherein the controller circuit is further configured to obtain input data and determine whether the data type of the input data is consistent with that of the data involved in the operations; when it is determined that the data type of the input data is inconsistent with that of the data involved in the operations, the controller circuit transmits the input data, the decimal point position, and the data type of the data involved in the operations to the conversion circuit; and wherein the conversion circuit is configured to perform data type conversion on the input data according to the decimal point position and the data type of the data involved in the operations to obtain converted input data, wherein the data type of the converted input data is consistent with that of the data involved in the operations. 2. The computation device of claim 1 , wherein obtaining the configuration instruction by the controller circuit before the operation device performs the operations refers to obtaining the configuration instruction before the operation device performs the operations on data of an i th layer of a multi-layer neural network. 3. The computation device of claim 2 , wherein: the computation device is configured to execute a machine learning computation, the controller circuit is further configured to transmit the converted input data to the operation device, and when the data type of the input data is consistent with that of the data involved in the operations, the controller circuit transmits the input data to the operation device, and the operation device is configured to perform the operations on the converted input data or the input data to obtain an operation result. 4. The computation device of claim 3 , wherein: the machine learning computation includes an artificial neural network operation, the first input data includes an input neuron and a weight, and the computation result is an output neuron. 5. The computation device of claim 3 , wherein the operation device includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit, the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction. 6. The computation device of claim 5 , further comprising a storage unit and a direct memory access (DMA) unit, wherein: the storage unit includes any combination of a register and a cache, the cache includes a scratch pad cache and is configured to store the first input data, and the register is configured to store scalar data in the first input data. 7. The computation device of claim 5 , wherein the operation device includes a tree module circuit; wherein: the tree module circuit includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and the tree module circuit is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits; and wherein the tree module circuit is an n-tree structure, wherein the n is an integer greater than or equal to two. 8. The computation device of claim 7 , wherein: the primary processing circuit is configured to perform a combined ranking processing on the plurality of intermediate results received from the plurality of processing circuits to obtain a computation result of the computation instruction, or the primary processing circuit is configured to a combined ranking processing and an activation processing on the plurality of intermediate results received from the plurality of processing circuits to obtain the computation result of the computation instruction. 9. The computation device of claim 7 , wherein the primary processing circuit includes one or any combination of an activation processing circuit and an addition processing circuit, wherein: the activation processing circuit is configured to perform an activation operation on data in the primary processing circuit, and the addition processing circuit is configured to perform an addition operation or an accumulation operation, the plurality of secondary processing circuit includes: a multiplication processing circuit configured to perform a multiplication operation on the data blocks received to obtain a product result, and an accumulation processing circuit configured to perform an accumulation operation on the product results to obtain the plurality of intermediate results. 10. The computation device of claim 5 , wherein the operation device further includes a branch processing circuit, wherein: the primary processing circuit is configured to: determine that the input neurons are broadcast data and the weights are distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuit, the branch processing circuit is configured to forward the data blocks, the broadcast data, and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the branch processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuit to obtain a computation result of the computation instruction, and to send the computation result of the computation instruction to the controller circuit. 11. The computation device of claim 5 , wherein the plurality of secondary processing circuits is distributed in an array, wherein: each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits, the K secondary processing circuits include n secondary processing circuits in the first row, n secondary processing circuits in the m th row, and m secondary processing
Learning methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.