Processor with instruction variable data distribution

US9519617B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9519617-B2
Application numberUS-201213548933-A
CountryUS
Kind codeB2
Filing dateJul 13, 2012
Priority dateJul 14, 2011
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different execution units. Each of the load units is configured to distribute the data to the registers in accordance with an instruction selectable distribution. The instruction selectable distribution specifies one of plurality of distributions. Each of the distributions specifies a data sequence that differs from the sequence in which the data is stored in memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve two values from memory in a single transaction; and load each of the two values to a plurality of alternate registers of the plurality of registers in a single transaction. 2. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve values from locations of memory via alternate memory lanes in a single transaction; and load the values to adjacent registers of the plurality of registers in a single transaction. 3. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, each of the load units configured to: retrieve values from locations of the memory via adjacent memory lanes in a single transaction; and load a copy of each of the values into a plurality of adjacent registers in a single transaction. 4. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write, in a single transaction, each of the values into memory at a location offset from a location of an immediately preceding write by one more than a number of values retrieved from the registers. 5. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write a sub-plurality of the retrieved values to locations in memory via adjacent memory lanes in a single transaction, the sub-plurality selected in accordance with a template value stored in a register of the vector processor. 6. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, each of the store units configured to: retrieve a plurality of values from adjacent ones of the registers in a single transaction; and write the values to alternate locations in the memory in a single transaction. 7. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a load instruction specifying upsampling by a factor of two while moving a plurality of data values from memory to the registers in a single transaction; and a plurality of store units configured to execute a store instruction specifying downsampling by a factor of two while moving a plurality of data values from the registers to memory in a single transaction. 8. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a load instruction specifying expansion of data stored in memory in compacted form while moving a plurality of data values from memory to the registers in a single transaction, the expansion based on a template stored in a register of the vector coprocessor core; and a plurality of store units configured to execute a store instruction specifying compaction of data stored in the registers while moving a plurality of data values from the registers to memory in a single transaction, the compaction based on a template stored in a register of the vector coprocessor core. 9. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units configured to execute a store instruction specifying a selectable distribution that causes at least one of the store units to move values retrieved from a plurality of adjacent ones of the registers to locations in memory via alternate memory lanes in a single transaction. 10. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; and a plurality of load units, at least one of the load units configured to move a predetermined number of values in adjacent memory locations to adjacent registers as controlled by expansion control information, the expansion control information having a number of bits equal to the number of registers, with a number of 1 bits equal to the predetermined number of values, a register storing all 0s if a corresponding bit of the expansion control information 0 and a next of the predetermined number of values if the corresponding bit of the expansion control information is 1. 11. A vector processor comprising: a plurality of execution units arranged in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of store units, at least one of the store units configured to move values in adjacent registers to a predetermined number of adjacent memory locations as controlled by collation control information, the collation control information having a number of bits equal to the number of adjacent registers, with a number of 1 bits equal to the predetermined number of values, a memory storing a value stored in a next adjacent register having corresponding bit of the collation control information of 1. 12. A processor comprising: a scalar processor core; and a vector coprocessor core coupled to the scalar processor core; the vector coprocessor core configured to execute vector instructions passed by the scalar processor core, the vector coprocessor core comprising: a plurality of execution units arranged to execute an instruction in parallel; a register file, comprising a plurality of registers coupled to the execution units; a plurality of load units, at least one of the load units configured to move a predetermined number of values in adjacent memory locations to adjacent registers as controlled by expansion control information, the expansion control information having a number of bits equal to the number of registers, with

Assignees

Inventors

Classifications

  • Vector processors · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Organisation of register space, e.g. banked or distributed register file · CPC title

  • Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9519617B2 cover?
A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different exe…
Who is the assignee on this patent?
Hung Ching-Yu, Inamori Shinri, Sankaran Jagadeesh, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F15/8053. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).