Sub-vector-supporting instruction for scalable vector instruction set architecture

US2025156184A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025156184-A1
Application numberUS-202218844296-A
CountryUS
Kind codeA1
Filing dateDec 15, 2022
Priority dateMar 11, 2022
Publication dateMay 15, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus has processing circuitry ( 16 ) to perform data processing, and instruction decoding circuitry ( 10 ) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths. The instruction decoding circuitry and the processing circuitry support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements. In response to the sub-vector-supporting instruction, the instruction decoding circuitry controls the processing circuitry to perform an operation for the given vector at sub-vector granularity. Each sub-vector has an equal sub-vector length.

First claim

Opening claim text (preview).

1 . An apparatus comprising: processing circuitry to perform data processing; and instruction decoding circuitry to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths; in which: the instruction decoding circuitry and the processing circuitry are configured to support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements, each sub-vector having an equal sub-vector length; and in response to the sub-vector-supporting instruction, the instruction decoding circuitry is configured to control the processing circuitry to perform an operation for the given vector at sub-vector granularity. 2 . The apparatus according to claim 1 , in which each sub-vector has a sub-vector length which is known at compile time for a given instruction sequence to be executed using the sub-vector-supporting instruction. 3 . The apparatus according to claim 1 , in which how many sub-vectors are comprised by the given vector is unknown at compile time for the given instruction sequence. 4 . The apparatus according to claim 1 , in which in response to the sub-vector-supporting instruction, the instruction decoding circuitry is configured to control the processing circuitry to process each of the sub-vectors in response to the same instance of executing the sub-vector-supporting instruction. 5 . The apparatus according to claim 1 , in which each sub-vector has a sub-vector length of an architecturally-defined fixed size which is independent of a vector length used for the given vector. 6 . The apparatus according to claim 5 , in which the architecturally-defined fixed size corresponds to an architecturally-defined maximum vector length prescribed for vector instructions processed according to a predetermined non-scalable vector instruction set architecture. 7 . The apparatus according to claim 5 , in which the architecturally-defined fixed size is 128 bits. 8 . The apparatus according to claim 1 , in which each vector element of each sub-vector has a variable element size, and the sub-vector length is independent of which element size is used for each vector element within each sub-vector. 9 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation performed, for each sub-vector, on vector elements within that sub-vector, independent of elements in other sub-vectors. 10 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation performed, for each element position within a sub-vector, on respective vector elements at that element position within each of the plurality of sub-vectors. 11 . The apparatus according to claim 1 , in which for at least one sub-vector-supporting instruction, the operation performed at sub-vector granularity is an operation to set, or perform an operation depending on, selected predicate bits of a predicate value, where the selected predicate bits are predicate bits corresponding to sub-vector-sized portions of a vector. 12 . The apparatus according to claim 1 , in which, in response to a sub-vector-supporting permute instruction, the instruction decoder is configured to control the processing circuitry to set, for each sub-vector of a vector result, the sub-vector to a permutation of one or more vector elements selected from among vector elements within a correspondingly-positioned sub-vector of at least one vector operand. 13 . The apparatus according to claim 1 , in which, in response to a sub-vector-supporting reduction instruction, the instruction decoder is configured to control the processing circuitry to perform at least one reduction operation at sub-vector granularity, each reduction operation to reduce a plurality of vector elements of an operand vector to a single data value within a result. 14 . The apparatus according to claim 13 , in which, for an intra-sub-vector sub-vector-supporting reduction instruction, for each reduction operation the plurality of vector elements comprise the respective vector elements within a corresponding sub-vector of the operand vector. 15 . The apparatus according to claim 13 , in which, for an inter-sub-vector sub-vector-supporting reduction instruction, for each reduction operation the plurality of vector elements comprise the vector elements at corresponding element positions within a plurality of sub-vectors of the operand vector. 16 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting load/store instruction, the instruction decoder is configured to control the processing circuitry to perform a load/store operation to transfer, at sub-vector granularity, one or more sub-vectors between a memory system and at least one vector register. 17 . The apparatus according to claim 16 , in which the sub-vector-supporting load/store instruction is a predicated instruction associated with a predicate value; and in response to the sub-vector-supporting load/store instruction, the instruction decoder is configured to control the processing circuitry to control, based on predicate bits selected from the predicate value at sub-vector granularity, whether each transfer of the one or more sub-vectors is performed or masked. 18 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting increment/decrement instruction, the instruction decoder is configured to control the processing circuitry to increment or decrement an operand value based on how many sub-vector-sized portions of a vector are indicated as active by bits of a predicate value selected from the predicate value at sub-vector granularity. 19 . The apparatus according to claim 18 , in which the predicate value is one of: a predicate value specified as a predicate operand by the sub-vector-supporting increment/decrement instruction; and a predicate value implied by a predicate pattern identifier specified by the sub-vector-supporting increment/decrement instruction, the predicate pattern identifier specifying a predetermined pattern of predicate bits at sub-vector granularity. 20 . The apparatus according to claim 1 , in which in response to a sub-vector-supporting predicate setting instruction, the instruction decoder is configured to control the processing circuitry to perform a predicate setting operation to set bits of a predicate value at sub-vector-granularity, to indicate which sub-vectors of a vector are active. 21 . The apparatus according to claim 20 , in which the predicate setting operation comprises setting the predicate value based on one of: a predicate pattern identifier specifying a predetermined pattern of predicate bits to be applied at sub-vector granularity; and sub-vector-granularity comparison operations based on a comparison of a first operand and a second operand. 22 . A method comprising: decoding, using instruction decoding circuitry, program instructions defined according to a scalable vector instruction set architecture supporting ve

Assignees

Inventors

Classifications

  • Arithmetic instructions · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • using a mask · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025156184A1 cover?
An apparatus has processing circuitry ( 16 ) to perform data processing, and instruction decoding circuitry ( 10 ) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same in…
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/3887. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).