Parallel Processing Of Data
US-2024338235-A1 · Oct 10, 2024 · US
US2025094378A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025094378-A1 |
| Application number | US-202418966083-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 2, 2024 |
| Priority date | Jan 11, 2024 |
| Publication date | Mar 20, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A single instruction, multiple thread (SIMT) processor of an aspect includes a register file having a plurality of sets of registers. Each of the plurality of sets of registers corresponds to a different thread of a parallel thread group. The SIMT processor also includes a storage coupled with the register file. The storage has a plurality of sets of one or more data element storage locations. Each of the plurality of sets of one or more data element storage locations corresponds to a different thread of the parallel thread group. Each of the sets of one or more data element storage locations is to store a copy of one or more data elements from only a subset of the set of registers for the corresponding thread. Other SIMT processors, methods, and systems are also disclosed.
Opening claim text (preview).
What is claimed is: 1 . A single instruction, multiple thread (SIMT) processor comprising: a register file having a plurality of sets of registers, each of the plurality of sets of registers corresponding to a different thread of a parallel thread group; and a storage coupled with the register file, the storage having a plurality of sets of one or more data element storage locations, each of the plurality of sets of one or more data element storage locations corresponding to a different thread of the parallel thread group, each of the sets of one or more data element storage locations to store a copy of one or more data elements from only a subset of the set of registers for the corresponding thread. 2 . The SIMT processor of claim 1 , further comprising: an instruction unit to receive a SIMT instruction; and a plurality of processor elements coupled with the instruction unit, each of the processor elements to perform operations corresponding to the SIMT instructions for a different corresponding thread of the parallel thread group, wherein the operations are to be performed on both one or more data elements output from the register file and the one or more data elements output from the storage. 3 . The SIMT processor of claim 1 , wherein the plurality of sets of registers include at least eight sets of registers, wherein each of the plurality of sets of registers includes at least eight registers, and wherein each of the plurality of sets of one or more data element storage locations is to store the copy of the one or more data elements written from no more than four registers of the set of registers. 4 . The SIMT processor of claim 3 , wherein the plurality of sets of registers include at least sixteen sets of registers, wherein each of the plurality of sets of registers includes at least sixteen registers, and wherein each of the sets of one or more data element storage locations is to store the copy of the one or more data elements written from no more than three registers of the set of registers. 5 . The SIMT processor of claim 1 , wherein a register of the plurality of sets of registers is to be selected based on both a thread index that is to be used to select a set of registers of the plurality of sets of registers corresponding to a thread indexed by the thread index and a registers index that is to be used to select a register of the selected set of registers indexed by the register index, and wherein a set of one or more data element storage locations of the plurality of sets of one or more data element storage locations is to be selected by the thread index but not the register index, the thread index to be used to select the set of one or more data element storage locations corresponding to a thread indexed by the thread index. 6 . The SIMT processor of claim 1 , wherein the SIMT processor is a general-purpose graphics processing unit (GPGPU). 7 . The SIMT processor of claim 1 , wherein the SIMT processor is a soft SIMT processor. 8 . The SIMT processor of claim 1 , further comprising: an instruction unit to receive a SIMT instruction, the SIMT instruction having a first register index, wherein, based on the SIMT instruction: the register file is to output a first data element from a first register indicated by the first register index in each of the plurality of sets of registers; and the storage is to store each of the first data elements in the set of one or more data element storage locations for the corresponding thread. 9 . The SIMT processor of claim 8 , wherein the SIMT instruction has a second register index, wherein, based on the SIMT instruction: the register file is to output a second data element from a second register indicated by the second register index in each of the plurality of sets of registers; and the storage is to store each of the second data elements in the set of one or more data element storage locations for the corresponding thread. 10 . The SIMT processor of claim 1 , further comprising: an instruction unit to receive a SIMT instruction, the SIMT instruction having a first register index, wherein, based on the SIMT instruction: the register file is to output a first data element from a first register indicated by the first register index in each of the plurality of sets of registers; and the storage is to output the copy of the one or more data elements from each of the plurality of sets of one or more data element storage locations; and a plurality of processor elements coupled with the instruction unit, each of the processor elements to perform one or more operations corresponding to the SIMT instruction for a different corresponding thread of the parallel thread group, on the first data element output from the set of registers corresponding to the thread and the copy of the one or more data elements output from the set of one or more data element storage locations corresponding to the thread. 11 . The SIMT processor of claim 10 , wherein the SIMT instruction has a second register index, wherein, based on the SIMT instruction: the register file is to output a second data element from a second register indicated by the second register index in each of the plurality of sets of registers; and the storage is to output the copy of two data elements from each of the plurality of sets of one or more data element storage locations; and each of the processor elements is to perform the one or more operations corresponding to the SIMT instruction for the different corresponding thread of the parallel thread group, on the first and second data elements output from the set of registers corresponding to the thread and the copy of the two data elements output from the set of one or more data element storage locations corresponding to the thread. 12 . The SIMT processor of claim 11 , wherein the one or more operations are to generate one of a real component and an imaginary component of a product of a multiplication of two complex numbers. 13 . A single instruction, multiple thread (SIMT) processor comprising: an instruction unit to receive a SIMT instruction, the SIMT instruction having a first register index; a register file having a plurality of sets of registers, each of the plurality of sets of registers corresponding to a different thread of a parallel thread group; and a storage coupled with the register file, the storage having a plurality of sets of one or more data element storage locations, each of the plurality of sets of one or more data element storage locations corresponding to a different thread of the parallel thread group, wherein, based on the SIMT instruction: the register file is to output a copy of a first data element from a first register indicated by the first register index in each of the plurality of sets of registers; and the storage is to store each of the copies of the first data elements in the set of one or more data element storage locations for the corresponding thread. 14 . The SIMT processor of claim 13 , wherein the SIMT instruction has a second register index, and wherein, based on the SIMT instruction: the register file is to output a copy of a second data element from a second register indicated by the second register index in each of the plurality of sets of registers; and the storage is to store each of the copies of the second data elements in the set of one or more data element storage locations for the corresponding thread. 15 . The SIMT processor of claim 13 , wherein the plurality of sets of registers include at least eight sets of registers, wherein each of the plurality of sets o
from multiple instruction streams, e.g. multistreaming · CPC title
Organisation of register space, e.g. banked or distributed register file · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
comprising a single central processing unit · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.