Vector processing unit
US-10261786-B2 · Apr 16, 2019 · US
US11520581B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11520581-B2 |
| Application number | US-202117327957-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 24, 2021 |
| Priority date | Mar 9, 2017 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a plurality of vector processing units; and a plurality of matrix units coupled to the plurality of vector processing units such that data communications can be exchanged, each matrix unit being configured to perform multiplications between weights of a neural network and activation inputs to generate accumulated values, wherein each vector processing unit is arranged in a corresponding vector processing unit (VPU) lane, and wherein each vector processing unit comprises: a plurality of processor units arranged across multiple sub-lanes of the VPU lane, wherein each processor units comprises an arithmetic logic unit (ALU) configured to perform arithmetic operations associated with vectorized computations for a multi-dimensional data array; and a corresponding vector memory in data communication with the plurality of processor units, wherein the vector memory includes memory banks configured to store data used by the plurality of processor units to perform the arithmetic operations, wherein the plurality of processor units and the corresponding vector memory are tightly coupled within an area of the vector processing unit such that data communications can be exchanged at a high bandwidth based on the placement of respective processor units relative to one another and based on the placement of the vector memory relative to each processor unit. 2. The system of claim 1 , wherein each processor unit of the plurality of processor units comprises at least one ALU. 3. The system of claim 1 , wherein each vector processor unit comprises 16 ALUs. 4. The system of claim 1 , wherein the vector memory comprises static random access memory (SRAM). 5. The system of claim 1 , wherein the system is configured to allow transfer of 32 bytes between the vector memory and the plurality of processor units during a single clock cycle. 6. The system of claim 1 , wherein the vector processing unit is configured to perform vector computations based on concurrent use of two or more of the ALUs. 7. The system of claim 1 , wherein each ALU is configured to perform a 32-bit arithmetic operation between streams of vector data that represent operands for the arithmetic operation. 8. The system of claim 1 , wherein the plurality of matrix units and the plurality of vector processing units represent a processor core of an integrated circuit chip; and the processor core is confugured to processs a single instruction stream at least across the multiple sub-lanes. 9. The system of claim 1 , wherein: units of the system are configured to operate on streams of data; a first stream of data progresses in a first direction toward the plurality of matrix units; and a second, different stream of data progresses in a second direction away from the plurality of matrix units. 10. The system of claim 1 , wherein at least one processor unit comprises a plurality of ALUs, and wherein multiple ALUs within a single processor unit are configured to execute arithmetic operations simultaneously during a single processor clock cycle. 11. The system of claim 1 , wherein multiple ALUs within a single processor unit are configured to execute arithmetic operations simultaneously during a single processor clock cycle. 12. The system of claim 1 , wherein the system is configured to perform at least 2048 operations in a single clock cycle. 13. The system of claim 1 , wherein each operation includes a 32-bit word. 14. The system of claim 1 , wherein a VPU lane is configured to move 8 vectors from a corresponding memory unit of the VPU lane to 8 sub-lanes of the VPU lane within a single clock cycle.
Electrical coupling · CPC title
organised in groups of units sharing resources, e.g. clusters · CPC title
Machine learning · CPC title
using electronic means · CPC title
Synchronisation or serialisation instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.