Data processing apparatus and method for performing scan operations

US9355061B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9355061-B2
Application numberUS-201414165967-A
CountryUS
Kind codeB2
Filing dateJan 28, 2014
Priority dateJan 28, 2014
Publication dateMay 31, 2016
Grant dateMay 31, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.

First claim

Opening claim text (preview).

We claim: 1. A data processing apparatus comprising: a vector register store configured to store vector operands; processing circuitry configured to perform operations on vector operands retrieved from said vector register store; control circuitry configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand; the control circuitry being responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation, the processing circuitry being configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. 2. A data processing apparatus as claimed in claim 1 , wherein said sequence of vector elements on which the scan operation is to be performed comprise all N vector elements within the specified vector operand. 3. A data processing apparatus as claimed in claim 1 , wherein the control circuitry is configured to determine said sequence of vector elements on which the scan operation is to be performed with reference to predicate control data. 4. A data processing apparatus as claimed in claim 1 , wherein said vector scan instruction further specifies a scalar carry-in value forming an input for the defined scan operation, and the processing circuitry is configured to employ the scalar carry-in value as an input during performance of the partitioned scan operation. 5. A data processing apparatus as claimed in claim 4 , wherein the processing circuitry is configured to employ the scalar carry-in value as an input to the computation operation in order to combine the scalar carry-in value with the intermediate results during generation of the final result vector operand. 6. A data processing apparatus as claimed in claim 1 , wherein said processing circuitry comprises SIMD processing circuitry providing a plurality of lanes of parallel processing, each lane being configured to operate on one vector element of each vector operand provided to the SIMD processing circuitry. 7. A data processing apparatus as claimed in claim 6 , wherein the number of lanes of parallel processing is equal to the number of vector elements in each of said P groups. 8. A data processing apparatus as claimed in claim 6 , wherein the processing circuitry is configured to perform said separate scan operations sequentially. 9. A data processing apparatus as claimed in claim 8 , wherein said SIMD processing circuitry comprises a plurality of pipeline stages used to implement each separate scan operation, and performance of said separate scan operations is partially overlapped. 10. A data processing apparatus as claimed in claim 6 , wherein the number of lanes of parallel processing is a multiple M of the number of vector elements in each of said P groups, and the processing circuitry is configured to perform M separate scan operations in parallel, with each of the M separate scan operations being allocated to a different subset of the lanes. 11. A data processing apparatus as claimed in claim 6 , wherein the control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into 2 groups. 12. A data processing apparatus as claimed in claim 11 , wherein the SIMD processing circuitry comprises one or more SIMD processing units used to perform said partitioned scan operation, and at least one of said one or more SIMD processing units has N/2 lanes of parallel processing. 13. A data processing apparatus as claimed in claim 1 , wherein the processing circuitry is configured to perform the computation operation by performing separate computation operations on the intermediate results for each group, each separate computation operation comprising combining the intermediate results for the associated group with a carry-in value. 14. A data processing apparatus as claimed in claim 13 , wherein the groups are ordered from a first group to a final group, and the processing circuitry is configured to perform the separate computation operations staggered in time such that, for all groups other than the first group, the carry-in value is provided by one of the result vector elements generated by the separate computation operation performed for the preceding group. 15. A data processing apparatus as claimed in claim 14 , wherein for the first group, the carry-in value is provided by a scalar carry-in value specified by the vector scan instruction. 16. A data processing apparatus as claimed in claim 1 , wherein each separate scan operation is performed in one or more parts. 17. A data processing apparatus as claimed in claim 16 , wherein the processing circuitry is configured to perform each part of each separate scan operation in multiple pipeline stages. 18. A data processing apparatus as claimed in claim 13 , wherein: each separate scan operation is performed in one or more parts, and the processing circuitry is configured to perform each part of each separate scan operation in multiple pipeline stages; the processing circuitry is configured to perform each separate computation operation in one or more pipeline stages. 19. A data processing apparatus as claimed in claim 18 , wherein the processing circuitry is configured to perform each separate computation operation in less pipeline stages than are used to perform each part of each separate scan operation. 20. A data processing apparatus as claimed in claim 1 , wherein the control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into N/2 groups. 21. A data processing apparatus as claimed in claim 1 , wherein the processing circuitry comprises one or more scalar processing units used to perform the partitioned scan operation under control of the control circuitry. 22. A data processing apparatus as claimed in claim 3 , wherein if the predicate control data identifies that all of the adjacent vector elements in a particular group are not within said sequence of vector elements on which the scan operation is to be performed, the processing circuitry is configured to omit processing of at least one of the separate scan operation for that particular group and the associated part of the computation operation. 23. A data processing apparatus as claimed in claim 3 , wherein said predicate control data is used to perform at least one of movement and modification of one or more vector elements within the vector operand. 24. A data processing apparatus as claimed in claim 23 , wherein said at least one of movement and modification of one or more vector elements within the vector operand is performed prior to performance of said partitioned scan operation. 25. A data processing apparatus as claim

Assignees

Inventors

Classifications

  • Runtime instruction translation, e.g. macros · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • Pipelining a single stage, e.g. superpipelining · CPC title

  • Register arrangements · CPC title

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9355061B2 cover?
A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to per…
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).