Vector processor with vector and element reduction method
US-2024004647-A1 · Jan 4, 2024 · US
US9519617B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9519617-B2 |
| Application number | US-201213548933-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 13, 2012 |
| Priority date | Jul 14, 2011 |
| Publication date | Dec 13, 2016 |
| Grant date | Dec 13, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different execution units. Each of the load units is configured to distribute the data to the registers in accordance with an instruction selectable distribution. The instruction selectable distribution specifies one of plurality of distributions. Each of the distributions specifies a data sequence that differs from the sequence in which the data is stored in memory.
Opening claim text (preview).
What is claimed is: 1. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve two values from memory in a single transaction; and load each of the two values to a plurality of alternate registers of the plurality of registers in a single transaction. 2. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve values from locations of memory via alternate memory lanes in a single transaction; and load the values to adjacent registers of the plurality of registers in a single transaction. 3. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve values from locations of the memory via adjacent memory lanes in a single transaction; and load a copy of each of the values into a plurality of adjacent registers in a single transaction. 4. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write, in a single transaction, each of the values into memory at a location offset from a location of an immediately preceding write by one more than a number of values retrieved from the registers. 5. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write a sub-plurality of the retrieved values to locations in memory via adjacent memory lanes in a single transaction, the sub-plurality selected in accordance with a template value stored in a register of the vector processor. 6. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write the values to alternate locations in the memory in a single transaction. 7. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a load instruction specifying upsampling by a factor of two while moving a plurality of data values from memory to the registers in a single transaction; and a plurality of store units configured to execute a store instruction specifying downsampling by a factor of two while moving a plurality of data values from the registers to memory in a single transaction. 8. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a load instruction specifying expansion of data stored in memory in compacted form while moving a plurality of data values from memory to the registers in a single transaction, the expansion based on a template stored in a register of the vector coprocessor core; and a plurality of store units configured to execute a store instruction specifying compaction of data stored in the registers while moving a plurality of data values from the registers to memory in a single transaction, the compaction based on a template stored in a register of the vector coprocessor core. 9. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a store instruction specifying a selectable distribution that causes at least one of the store units to move values retrieved from a plurality of adjacent ones of the registers to locations in memory via alternate memory lanes in a single transaction. 10. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, at least one of the load units configured to move a predetermined number of values in adjacent memory locations to adjacent registers as controlled by expansion control information, the expansion control information having a number of bits equal to the number of registers, with a number of 1 bits equal to the predetermined number of values, a register storing all 0s if a corresponding bit of the expansion control information 0 and a next of the predetermined number of values if the corresponding bit of the expansion control information is 1. 11. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, at least one of the store units configured to move values in adjacent registers to a predetermined number of adjacent memory locations as controlled by collation control information, the collation control information having a number of bits equal to the number of adjacent registers, with a number of 1 bits equal to the predetermined number of values, a memory storing a value stored in a next adjacent register having corresponding bit of the collation control information of 1. 12. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units, at least one of the load units configured to move a predetermined number of values in adjacent memory locations to adjacent registers as controlled by expansion control information, the expansion control information having a number of bits equal to the number of registers, with
Vector processors · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Organisation of register space, e.g. banked or distributed register file · CPC title
Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.