Processing device and related products

US11354133B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11354133-B2
Application numberUS-201916663210-A
CountryUS
Kind codeB2
Filing dateOct 24, 2019
Priority dateAug 31, 2017
Publication dateJun 7, 2022
Grant dateJun 7, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A matrix-multiplying-vector operation method and a processing device for performing the same are provided. The matrix-multiplying-vector method includes distributing, by a main processing circuit, basic data blocks of the matrix and broadcasting the vector to a plurality of the basic processing circuits. That way, the basic processing circuits can perform inner-product operations between the basic data blocks and the broadcasted vector in parallel. The results are then provided back to main processing circuit for combining. The technical solutions proposed by the present disclosure provide short operation time and low energy consumption.

First claim

Opening claim text (preview).

What is claimed is: 1. A matrix-multiplying-vector operation method, performed by a processing device comprising a main processing circuit and a plurality of basic processing circuits, the matrix-multiplying-vector operation method comprising: receiving, by the main processing circuit, a matrix, a vector, and a matrix-multiplying-vector operation instruction; dividing, by the main processing circuit, the matrix into a plurality of basic data blocks; distributing, by the main processing circuit, the plurality of basic data blocks to the plurality of basic processing circuits, wherein: each of the plurality of basic data blocks is distributed to one of the plurality of the basic processing circuits and at least two basic processing circuits receive different basic data blocks; and at least two of the plurality of basic data blocks are distributed to a same basic processing circuit at one time when a number of the plurality of basic data blocks is larger than a number of the plurality of basic processing circuits; broadcasting, by the main processing circuit, at least a subset of the vector to the plurality of basic processing circuits simultaneously, wherein each of the plurality of basic processing circuits receives a same copy of the subset of the vector; performing, by each of the plurality of basic processing circuits, one or more inner-product operations on one or more basic data blocks distributed to that basic processing circuit and the same copy of the subset of the vector broadcast to that basic processing circuit to obtain a processing result, wherein the plurality of basic processing circuits perform the respective inner-product operations in parallel; providing, by the plurality of basic processing circuits, the respective processing results to the main processing circuit; and combining, by the main processing circuit, the processing results provided by the plurality of basic processing circuits to obtain a computation result of the matrix-multiplying-vector operation instruction. 2. The matrix-multiplying-vector operation method of claim 1 , wherein distributing the plurality of basic data blocks to the plurality of basic processing circuits includes: distributing the plurality of basic data blocks to the plurality of basic processing circuits non-repetitively and in an arbitrary order. 3. The matrix-multiplying-vector operation method of claim 1 , wherein distributing, by the main processing circuit, the plurality of basic data blocks to the plurality of basic processing circuits includes: when the number of the plurality of basic data blocks is smaller than or equal to the number of the plurality of basic processing circuits, distributing, by the main processing circuit, each of the plurality of basic data blocks to a separate basic processing circuit. 4. The matrix-multiplying-vector operation method of claim 1 , wherein the processing device further includes multiple branch processing circuits configured to connect the main processing circuit to the plurality of basic processing circuits, and the matrix-multiplying-vector operation method further includes: transmitting, by the multiple branch processing circuits, data among the main processing circuit and the plurality of basic processing circuits. 5. The matrix-multiplying-vector operation method of claim 1 , wherein the main processing circuit includes at least one of a vector arithmetic unit circuit, an arithmetic logic unit (ALU) circuit, an accumulator circuit, a matrix transposition circuit, a direct memory access (DMA) circuit, or a data rearrangement circuit. 6. The matrix-multiplying-vector operation method of claim 1 , wherein each of the plurality of basic processing circuits includes at least one of an inner-product arithmetic unit circuit or an accumulator circuit. 7. The matrix-multiplying-vector operation method of claim 1 , wherein the matrix is a weight matrix of a fully connected layer of a neural network, and the vector is input data of a single sample to the fully connected layer. 8. The matrix-multiplying-vector operation method of claim 1 , wherein at least one dimension of the matrix has a same size as that of the vector. 9. A processing device comprising a main processing circuit and a plurality of basic processing circuits, wherein: the main processing circuit is configured to: receive a matrix, a vector, and a matrix-multiplying-vector operation instruction; divide the matrix into a plurality of basic data blocks; distribute the plurality of basic data blocks to the plurality of basic processing circuits, wherein: each of the plurality of basic data blocks is distributed to one of the plurality of the basic processing circuits and at least two basic processing circuits receive different basic data blocks; and at least two of the plurality of basic data blocks are distributed to a same basic processing circuit at one time when a number of the plurality of basic data blocks is larger than a number of the plurality of basic processing circuits; and broadcast at least a subset of the vector to the plurality of basic processing circuits simultaneously, wherein each of the plurality of basic processing circuits receives a same copy of the subset of the vector; each of the plurality of basic processing circuits is configured to: perform one or more inner-product operations on one or more basic data blocks distributed to that basic processing circuit and the same copy of the subset of the vector broadcast to that basic processing circuit to obtain a processing result; and provide the processing result to the main processing circuit; wherein the plurality of basic processing circuits are configured to perform the respective inner-product operations in parallel; and the main processing circuit is further configured to combine the processing results provided by the plurality of basic processing circuits to obtain a computation result of the matrix-multiplying-vector operation instruction. 10. The processing device of claim 9 , wherein the main processing circuit is configured to distribute the plurality of basic data blocks to the plurality of basic processing circuits non-repetitively and in an arbitrary order. 11. The processing device of claim 9 , wherein the main processing circuit is configured to: distribute each of the plurality of basic data blocks to a separate basic processing circuit when the number of the plurality of basic data blocks is smaller than or equal to the number of the plurality of basic processing circuits. 12. The processing device of claim 9 , further comprising multiple branch processing circuits configured to connect the main processing circuit to the plurality of basic processing circuits, wherein the multiple branch processing circuits are configured to: transmit data among the main processing circuit and the plurality of basic processing circuits. 13. The processing device of claim 12 , wherein each of the multiple branch processing circuit is connected between the main processing circuit and at least one of the plurality of basic processing circuits. 14. The processing device of claim 9 , wherein the main processing circuit includes at least one of a vector arithmetic unit circuit, an arithmetic logic unit (ALU) circuit, an accumulator circuit, a matrix transposition circuit, a direct memory access (DMA) circuit, or a data rearrangement circuit. 15. The processing device of claim 9 , wherein each of the plurality of basic processing circuits includes at least one of an inner-product arithmetic unit circuit or an accumulator circuit. 16. The processing de

Assignees

Inventors

Classifications

  • Preprocessing · CPC title

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11354133B2 cover?
A matrix-multiplying-vector operation method and a processing device for performing the same are provided. The matrix-multiplying-vector method includes distributing, by a main processing circuit, basic data blocks of the matrix and broadcasting the vector to a plurality of the basic processing circuits. That way, the basic processing circuits can perform inner-product operations between the ba…
Who is the assignee on this patent?
Cambricon Tech Corp Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).