Vector reduction processor
US-2018285316-A1 · Oct 4, 2018 · US
US2025156184A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025156184-A1 |
| Application number | US-202218844296-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 15, 2022 |
| Priority date | Mar 11, 2022 |
| Publication date | May 15, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus has processing circuitry ( 16 ) to perform data processing, and instruction decoding circuitry ( 10 ) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths. The instruction decoding circuitry and the processing circuitry support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements. In response to the sub-vector-supporting instruction, the instruction decoding circuitry controls the processing circuitry to perform an operation for the given vector at sub-vector granularity. Each sub-vector has an equal sub-vector length.
Opening claim text (preview).
1 . An apparatus comprising: processing circuitry to perform data processing; and instruction decoding circuitry to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths; in which: the instruction decoding circuitry and the processing circuitry are configured to support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements, each sub-vector having an equal sub-vector length; and in response to the sub-vector-supporting instruction, the instruction decoding circuitry is configured to control the processing circuitry to perform an operation for the given vector at sub-vector granularity. 2 . The apparatus according to claim 1 , in which each sub-vector has a sub-vector length which is known at compile time for a given instruction sequence to be executed using the sub-vector-supporting instruction. 3 . The apparatus according to claim 1 , in which how many sub-vectors are comprised by the given vector is unknown at compile time for the given instruction sequence. 4 . The apparatus according to claim 1 , in which in response to the sub-vector-supporting instruction, the instruction decoding circuitry is configured to control the processing circuitry to process each of the sub-vectors in response to the same instance of executing the sub-vector-supporting instruction. 5 . The apparatus according to claim 1 , in which each sub-vector has a sub-vector length of an architecturally-defined fixed size which is independent of a vector length used for the given vector. 6 . The apparatus according to claim 5 , in which the architecturally-defined fixed size corresponds to an architecturally-defined maximum vector length prescribed for vector instructions processed according to a predetermined non-scalable vector instruction set architecture. 7 . The apparatus according to claim 5 , in which the architecturally-defined fixed size is 128 bits. 8 . The apparatus according to claim 1 , in which each vector element of each sub-vector has a variable element size, and the sub-vector length is independent of which element size is used for each vector element within each sub-vector. 9 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation performed, for each sub-vector, on vector elements within that sub-vector, independent of elements in other sub-vectors. 10 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation performed, for each element position within a sub-vector, on respective vector elements at that element position within each of the plurality of sub-vectors. 11 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation to set, or perform an operation depending on, selected predicate bits of a predicate value, where the selected predicate bits are predicate bits corresponding to sub-vector-sized portions of a vector. 12 . The apparatus according to claim 1 , in which, in response to a sub-vector-supporting permute instruction, the instruction decoder is configured to control the processing circuitry to set, for each sub-vector of a vector result, the sub-vector to a permutation of one or more vector elements selected from among vector elements within a correspondingly-positioned sub-vector of at least one vector operand. 13 . The apparatus according to claim 1 , in which, in response to a sub-vector-supporting reduction instruction, the instruction decoder is configured to control the processing circuitry to perform at least one reduction operation at sub-vector granularity, each reduction operation to reduce a plurality of vector elements of an operand vector to a single data value within a result. 14 . The apparatus according to claim 13 , in which, for an intra-sub-vector sub-vector-supporting reduction instruction, for each reduction operation the plurality of vector elements comprise the respective vector elements within a corresponding sub-vector of the operand vector. 15 . The apparatus according to claim 13 , in which, for an inter-sub-vector sub-vector-supporting reduction instruction, for each reduction operation the plurality of vector elements comprise the vector elements at corresponding element positions within a plurality of sub-vectors of the operand vector. 16 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting load/store instruction, the instruction decoder is configured to control the processing circuitry to perform a load/store operation to transfer, at sub-vector granularity, one or more sub-vectors between a memory system and at least one vector register. 17 . The apparatus according to claim 16 , in which the sub-vector-supporting load/store instruction is a predicated instruction associated with a predicate value; and in response to the sub-vector-supporting load/store instruction, the instruction decoder is configured to control the processing circuitry to control, based on predicate bits selected from the predicate value at sub-vector granularity, whether each transfer of the one or more sub-vectors is performed or masked. 18 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting increment/decrement instruction, the instruction decoder is configured to control the processing circuitry to increment or decrement an operand value based on how many sub-vector-sized portions of a vector are indicated as active by bits of a predicate value selected from the predicate value at sub-vector granularity. 19 . The apparatus according to claim 18 , in which the predicate value is one of: a predicate value specified as a predicate operand by the sub-vector-supporting increment/decrement instruction; and a predicate value implied by a predicate pattern identifier specified by the sub-vector-supporting increment/decrement instruction, the predicate pattern identifier specifying a predetermined pattern of predicate bits at sub-vector granularity. 20 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting predicate setting instruction, the instruction decoder is configured to control the processing circuitry to perform a predicate setting operation to set bits of a predicate value at sub-vector-granularity, to indicate which sub-vectors of a vector are active. 21 . The apparatus according to claim 20 , in which the predicate setting operation comprises setting the predicate value based on one of: a predicate pattern identifier specifying a predetermined pattern of predicate bits to be applied at sub-vector granularity; and sub-vector-granularity comparison operations based on a comparison of a first operand and a second operand. 22 . A method comprising: decoding, using instruction decoding circuitry, program instructions defined according to a scalable vector instruction set architecture supporting ve
Arithmetic instructions · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
using a mask · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.