Data processing system, computing node, and data processing method
US-10567494-B2 · Feb 18, 2020 · US
US11636327B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11636327-B2 |
| Application number | US-201715859203-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2017 |
| Priority date | Dec 29, 2017 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data.
Opening claim text (preview).
What is claimed is: 1. An apparatus to facilitate processing a sparse matrix for arbitrary graph data, comprising: a graphics processing unit, including: a data management unit (DMU) having a scheduler to schedule matrix operations, an active circuitry to track active input operands, and a skip circuitry to track zero and redundant input operands to be skipped by the scheduler; and processing circuitry coupled to the DMU, the processing circuitry comprising a plurality of processing elements including circuitry to read operands, and a multiplication unit to multiply two or more operands for the arbitrary graph data, wherein the DMU configures the processing circuitry coupled to the DMU to bypass an operation having zero or redundant input operands associated with an irregular neural network having an arbitrary connection across non-adjacent layers of the neural network. 2. The apparatus of claim 1 , wherein the scheduler to schedule non-zero and non-redundant operands at the multiplication unit. 3. The apparatus of claim 1 , further comprising: memory having pointer circuitry to store base pointers for input and output vectors; and memory to store input and output vectors. 4. The apparatus of claim 1 , wherein each processing element includes the circuitry to read operands, pointer circuitry for providing a column pointer to a memory address of a weighted coefficient of a matrix, data circuitry to generate and send a weighted coefficient value that is identified by the column pointer to the multiplication unit. 5. The apparatus of claim 4 , wherein the data circuitry sends an identifier of a memory address or a position of the output vector to an output buffer. 6. The apparatus of claim 1 , wherein the arbitrary connection across the non-adjacent layers of the neural network introduces the operations having the redundant or zero input operands. 7. A hardware accelerator to facilitate processing a sparse matrix for an arbitrary irregular neural network, comprising: a data management unit (DMU) having a scheduler to schedule matrix operations and an auxiliary buffer to store active input operands; and a plurality of processing elements coupled to the DMU, each processing element includes an input buffer for edge data and message data, and customizable circuitry to support an input vertex program for the arbitrary neural network, wherein the customizable circuitry to support an input vertex program supports an activate function. 8. The hardware accelerator of claim 7 , wherein the customizable circuitry to support an input vertex program additionally supports customized functions including multiply, accumulate and send message functions. 9. The hardware accelerator of claim 8 , wherein each processing element further comprises on-chip memory to receive vector data from off-chip memory via the DMU. 10. The hardware accelerator of claim 9 , wherein the DMU to obtain updated vector data from the on-chip memory based on the customized functions and then to send the updated vector data to the off-chip memory. 11. The hardware accelerator of claim 7 , wherein the hardware accelerator supports arbitrary connections across non-adjacent layers of the arbitrary irregular neural network. 12. A graphics processing unit, comprising: a sparsity management unit to manage sparsity operations, wherein the sparsity management unit comprises: a value check mechanism to detect unimportant values within input vectors, the unimportant values including zero operands and redundant operands, and skip operations for the unimportant values of the input vectors, and a scheduler to determine scheduling of computations based on scheduling important values and skipping unimportant values of input vectors that are detected by the value check mechanism; a block floating point (FP) management unit 3120 to support block FP operations; and a variable and mix precision compute unit to support variable and mix precision operations. 13. The graphics processing unit of claim 12 , wherein the scheduler is to bypass computations associated with unimportant values for an irregular neural network having an arbitrary connection across non-adjacent layers of the neural network. 14. The graphics processing unit of claim 12 , wherein the block FP management unit includes select circuitry to select a shared exponent for input vectors if the input vectors have block FP and thus different exponents. 15. The graphics processing unit of claim 14 , wherein the block FP management unit includes align circuitry to cause alignment of a mantissa for the input vector that has a change in exponent. 16. The graphics processing unit of claim 12 , wherein the variable and mix precision compute unit include computations units and accumulators to perform computations for input vectors, wherein the computations include at least one of spatial and temporal computations including any spatial and temporal combinations. 17. A method for training of data, comprising: obtaining a first sparse matrix encoded with compressed sparse row (CSR) and a second dense matrix; offloading the second dense matrix in a coalesced manner from memory to a shared local memory (SLM); determining a minimum number of workgroups to launch to minimize a number of redundant global memory loads to the SLM; selecting a work group size for each of the minimum number of workgroups; and launching the minimum number of workgroups for execution on a graphics processing unit (GPU), wherein the minimum number of workgroups is determined based on a total number of hardware threads supported by the GPU and a number of data elements associated with a hardware thread, wherein the redundant global memory loads are associated with redundant operands associated with an irregular neural network having an arbitrary connection across non-adjacent layers of the neural network. 18. The method of claim 17 , wherein the number of data elements associated with each hardware thread is determined based on a single instruction multiple data (SIMD) width associated with each hardware thread. 19. The method of claim 18 , further comprising: applying a load balancing technique for hardware threads such that each hardware thread completes a first block of data and processes a second block of data that is available. 20. The method of claim 19 , further comprising: generating outputs for a Sparse Dense general matrix vector multiplication (GEMV) GPU implementation for training of data.
Related publications grouped by family.
Answers are generated from the same data shown on this page.