Task execution in a SIMD processing unit with parallel groups of processing lanes

US11734788B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11734788-B2
Application numberUS-202117515278-A
CountryUS
Kind codeB2
Filing dateOct 29, 2021
Priority dateDec 18, 2013
Publication dateAug 22, 2023
Grant dateAug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

First claim

Opening claim text (preview).

What is claimed is: 1. A single instruction multiple data (SIMD) processing unit configured to process a plurality of tasks which each include up to a predetermined maximum number of work items, wherein the work items of a task are arranged for executing a common sequence of instructions on respective data items, wherein blocks of work items within a task relate to respective blocks of data items, each block of data items being a pixel quad, the SIMD processing unit comprising: a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles, wherein each of the processing lanes of the group is configured to execute instructions of a respective block of work items over a plurality of consecutive processing cycles; and logic coupled to the group of processing lanes configured to cause the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 2. The SIMD processing unit of claim 1 , wherein the logic is configured to set indicators to indicate how the work items have been assembled into the tasks. 3. The SIMD processing unit of claim 2 , further comprising: a store configured to store the processed data items output from the group of processing lanes; and storing logic configured to determine addresses for storing the processed data items in the store based on the indicators. 4. The SIMD processing unit of claim 1 , wherein the logic is configured to assemble the work items into the tasks such that work items of a block of work items are grouped together into the same task. 5. The SIMD processing unit of claim 1 , wherein the work items are assembled into blocks of work items such that each work item within a block of work items can be used to perform a pre-processing operation on the block of work items before it is passed to the group of processing lanes. 6. The SIMD processing unit of claim 5 , wherein the pre-processing operation is a gradient operation configured to determine the rate of change of a varying quantity between different pixels in a pixel quad. 7. The SIMD processing unit of claim 1 , wherein there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle if all of the work items which are scheduled for execution over the group of processing lanes in the particular processing cycle are invalid work items. 8. The SIMD processing unit of claim 1 , wherein there is not a valid work item scheduled for execution in a processing lane in a particular processing cycle if there is not a work item which is scheduled for execution in the processing lane in the particular processing cycle. 9. The SIMD processing unit of claim 8 , wherein work items which are not ready for execution when the task is due to be sent to the group of parallel processing lanes are not scheduled for execution. 10. The SIMD processing unit of claim 1 , wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, each group being configured to execute instructions of work items of a respective task over a plurality of processing cycles. 11. The SIMD processing unit of claim 10 , wherein the logic coupled to the groups of processing lanes is further configured to cause a particular group of processing lanes to skip a particular processing cycle, independently of the other groups of processing lanes, if there are no valid work items scheduled for execution in any of the processing lanes of the particular group in the particular processing cycle. 12. The SIMD processing unit of claim 11 , wherein the logic is configured to cause a particular group of processing lanes to skip a particular processing cycle whilst work items are scheduled to execute in the other groups of processing lanes in the particular processing cycle. 13. The SIMD processing unit of claim 1 , wherein there are three levels of validity for pixels of a pixel quad, a first level of validity being full validity, a second level of validity being partial invalidity and a third level of validity being full invalidity, and wherein the logic is configured to: skip a first particular processing cycle comprising work items corresponding to pixels of the third level of validity when instructions are to be executed on pixels of the first and second levels of validity, but instructions are not to be executed on pixels of the third level of validity; and skip a second particular processing cycle comprising work items corresponding to pixels of the second level of validity when instructions are to be executed on pixels of the first level of validity, but instructions are not to be executed on pixels of the second level of validity. 14. The SIMD processing unit of claim 13 , wherein the instructions to be executed on pixels of the first and second levels of validity form part of a texturing operation to be performed on a particular pixel of a pixel quad, and wherein the particular pixel is of the first level of validity, the neighbouring pixels in the pixel quad excluding the diagonal neighbour of the particular pixel are of the second level of validity, and the diagonal neighbour in the pixel quad is of the third level of validity. 15. A method of using a single instruction multiple data (SIMD) processing unit to process a plurality of tasks which each include up to a predetermined maximum number of work items, wherein the work items of a task are arranged for executing a common sequence of instructions on respective data items, wherein blocks of work items within a task relate to respective blocks of data items, each block of data items being a pixel quad, wherein the SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles, the method comprising: executing instructions of work items of the particular task using the group of processing lanes, wherein each of the processing lanes of the group executes instructions of a respective block of work items over a plurality of consecutive processing cycles; and causing the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 16. The method of claim 15 , further comprising setting indicators to indicate how the work items have been assembled into the tasks. 17. The method of claim 16 , further comprising: determining addresses for storing the processed data items in a store based on the indicators; and storing, at the determined addresses in the store, the processed data items output from the group of processing lanes. 18. The method of claim 15 , wherein said assembling the work items into the tasks comprises grouping work items of a block of work items relating to a pixel quad together into the same task. 19. The method of claim 15 , wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, the method comprising: executing, at each group of processing lanes, instructions of work items of a respective task over a plural

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

  • Divergence aspects · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11734788B2 cover?
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are inv…
Who is the assignee on this patent?
Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).