Task execution in a SIMD processing unit with parallel groups of processing lanes

US12112396B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12112396-B2
Application numberUS-202318236036-A
CountryUS
Kind codeB2
Filing dateAug 21, 2023
Priority dateDec 18, 2013
Publication dateOct 8, 2024
Grant dateOct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

First claim

Opening claim text (preview).

What is claimed is: 1. A single instruction multiple data (SIMD) processing unit configured to process a plurality of work items, wherein the work items are arranged for executing a common sequence of instructions on respective pixels of a pixel quad, the SIMD processing unit comprising: a group of processing lanes, each processing lane in the group being configured to execute instructions of a respective block of work items over a plurality of processing cycles; and logic configured to cause the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 2. The SIMD processing unit of claim 1 , wherein the work items are divided into tasks such that each task includes up to a predetermined maximum number of work items. 3. The SIMD processing unit of claim 2 , wherein each of the processing lanes of the group is configured to execute instructions of a respective block of work items of a particular task over the plurality of processing cycles. 4. The SIMD processing unit of claim 1 , wherein the logic is coupled to the group of processing lanes. 5. The SIMD processing unit of claim 1 , wherein the logic is configured to set indicators to indicate how the work items have been assembled into the tasks. 6. The SIMD processing unit of claim 5 , further comprising: a store configured to store the processed data items output from the group of processing lanes; and storing logic configured to determine addresses for storing the processed data items in the store based on the indicators. 7. The SIMD processing unit of claim 2 , wherein the logic is configured to assemble the work items into the tasks such that work items of a block of work items relating to the same pixel quad are grouped together into the same task. 8. The SIMD processing unit of claim 1 , wherein the work items are assembled into blocks of work items such that each work item within a block of work items can be used to perform a pre-processing operation on the block of work items before it is passed to the group of processing lanes. 9. The SIMD processing unit of claim 8 , wherein the pre-processing operation is a gradient operation configured to determine the rate of change of a varying quantity between different pixels in a pixel quad. 10. The SIMD processing unit of claim 1 , wherein there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle if all of the work items which are scheduled for execution over the group of processing lanes in the particular processing cycle are invalid work items. 11. The SIMD processing unit of claim 10 , wherein the logic is configured to assemble the work items into tasks so that blocks of work items are grouped together into tasks based on the number of invalid work items in the respective blocks of work items. 12. The SIMD processing unit of claim 10 , wherein the logic is configured to assemble the work items into tasks so that work items within a block of work items are re-ordered to thereby align the invalid work items from different blocks of work items of the task across the group of processing lanes. 13. The SIMD processing unit of claim 1 , wherein there is not a valid work item scheduled for execution in a processing lane in a particular processing cycle if there is not a work item which is scheduled for execution in the processing lane in the particular processing cycle. 14. The SIMD processing unit of claim 13 , wherein work items which are not ready for execution when the task is due to be sent to the group of parallel processing lanes are not scheduled for execution. 15. The SIMD processing unit of claim 2 , wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, each group being configured to execute instructions of work items of a respective task over a plurality of processing cycles. 16. The SIMD processing unit of claim 15 , wherein the logic coupled to the groups of processing lanes is further configured to cause a particular group of processing lanes to skip a particular processing cycle, independently of the other groups of processing lanes, if there are no valid work items scheduled for execution in any of the processing lanes of the particular group in the particular processing cycle. 17. The SIMD processing unit of claim 1 , wherein there are three levels of validity for pixels of a pixel quad, a first level of validity being full validity, a second level of validity being partial invalidity and a third level of validity being full invalidity, and wherein the logic is configured to: skip a first particular processing cycle comprising work items corresponding to pixels of the third level of validity when instructions are to be executed on pixels of the first and second levels of validity, but instructions are not to be executed on pixels of the third level of validity; and skip a second particular processing cycle comprising work items corresponding to pixels of the second level of validity when instructions are to be executed on pixels of the first level of validity, but instructions are not to be executed on pixels of the second level of validity. 18. The SIMD processing unit of claim 17 , wherein the instructions to be executed on pixels of the first and second levels of validity form part of a texturing operation to be performed on a particular pixel of a pixel quad, and wherein the particular pixel is of the first level of validity, the neighbouring pixels in the pixel quad excluding the diagonal neighbour of the particular pixel are of the second level of validity, and the diagonal neighbour in the pixel quad is of the third level of validity. 19. A method of using a single instruction multiple data (SIMD) processing unit to process a plurality of work items, wherein the work items are arranged for executing a common sequence of instructions on respective pixels of a pixel quad, wherein the SIMD processing unit comprises a group of processing lanes, each processing lane in the group being configured to execute instructions of a respective block of work items over a plurality of processing cycles, the method comprising: executing instructions of work items using the group of processing lanes; and causing the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 20. A non-transitory computer readable storage medium having stored thereon an integrated circuit dataset description that when inputted causes an integrated circuit manufacturing system to generate a single instruction multiple data (SIMD) processing unit configured to process a plurality of work items, wherein the work items are arranged for executing a common sequence of instructions on respective pixels of a pixel quad, the SIMD processing unit comprising: a group of processing lanes, each processing lane in the group being configured to execute instructions of a respective block of work items over a plurality of processing cycles; and logic configured to cause the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items sc

Assignees

Inventors

Classifications

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Parallel decoding, e.g. parallel decode units · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

  • Divergence aspects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112396B2 cover?
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are inv…
Who is the assignee on this patent?
Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).