Allocation of memory resources to SIMD workgroups

US10990448B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10990448-B2
Application numberUS-201816132703-A
CountryUS
Kind codeB2
Filing dateSep 17, 2018
Priority dateSep 15, 2017
Publication dateApr 27, 2021
Grant dateApr 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource allocator configured to, in response to receiving a memory resource request for first memory resources in respect of a first-received task of a workgroup, allocate to the workgroup a block of memory portions sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.

First claim

Opening claim text (preview).

The invention claimed is: 1. A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource allocator configured to, in response to receiving a memory resource request for first memory resources in respect of a first-received task of a workgroup, allocate to the entire workgroup a block of memory portions sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources. 2. A memory subsystem as claimed in claim 1 , wherein the resource allocator is configured to allocate the block as a contiguous block of memory portions. 3. A memory subsystem as claimed in claim 1 , wherein the resource allocator is configured to, on servicing the first-received task of the workgroup, allocate to that task the requested first memory resources from the block and reserve the remaining memory portions of the block so as to prevent allocation to tasks of other workgroups. 4. A memory subsystem as claimed in claim 1 wherein the resource allocator is configured to, in response to subsequently receiving a memory resource request in respect of a second task of the workgroup, allocate memory resources of the block to that second task. 5. A memory subsystem as claimed in claim 1 , wherein the resource allocator is arranged to receive memory resource requests from a plurality of different requestors and to, in response to allocating the block of memory portions to the workgroup, preferentially service memory requests received from the requestor from which the first-received task of that workgroup was received. 6. A memory subsystem as claimed in claim 1 , wherein the resource allocator is further configured to, in response to receiving an indication that processing of a task of the workgroup has completed, deallocate the memory resources allocated to that task without waiting for processing of the workgroup to complete. 7. A memory subsystem as claimed in claim 1 , wherein the shared memory is further partitioned into a plurality of non-overlapping windows each comprising a plurality of memory portions and the resource allocator is configured to maintain a window pointer indicating a current window in which allocation of memory portions will be attempted in response to a next-received memory request. 8. A memory subsystem as claimed in claim 7 , wherein the resource allocator is embodied in a binary logic circuit and the window length of each window is such that the availability of all of the memory portions of each window can be checked in a single clock cycle of the binary logic circuit. 9. A memory subsystem as claimed in claim 1 , wherein the resource allocator is further configured to maintain a fine status array arranged to indicate whether each memory portion of the shared memory is allocated to a task. 10. A memory subsystem as claimed in claim 9 , wherein the resource allocator is configured to, in response to receiving the memory resource request in respect of the first-received task of the workgroup, search a current window for a contiguous block of memory portions which are indicated by the fine status array as being available for allocation, the resource allocator being configured to, if such a contiguous block is identified in the current window, allocate that contiguous block to the workgroup. 11. A memory subsystem as claimed in claim 10 , wherein the resource allocator is configured to allocate the contiguous block of memory portions such that the block starts at a lowest possible position in the window. 12. A memory subsystem as claimed in claim 9 , wherein the shared memory is further partitioned into a plurality of non-overlapping windows each comprising a plurality of memory portions, wherein the resource allocator is further configured to maintain a coarse status array arranged to indicate, for each non-overlapping window of the shared memory, whether all the memory portions of the non-overlapping window are unallocated, the resource allocator being configured to, in parallel with searching a current non-overlapping window for a contiguous block of memory portions, check the coarse status array to determine whether the size of the requested block can be accommodated by one or more subsequent non-overlapping windows; the resource allocator being configured to, if both a sufficiently large contiguous block cannot be identified in the current window and the requested block can be accommodated by one or more subsequent windows, allocate the block to the workgroup comprising memory portions starting at a first memory portion of the current window in a contiguous block with the subsequent window(s) and extending into those subsequent window(s). 13. A memory subsystem as claimed in claim 12 , wherein the resource allocator is further configured to, in parallel with searching the current window, form an overflow metric representing the memory resources of the required block of memory portions which cannot be accommodated in the current window starting at the first memory portion of the current window in a contiguous block of unallocated memory portions immediately adjacent to the subsequent window, the resource allocator being configured to, if both a sufficiently large contiguous block cannot be identified in the current window and the requested block cannot be accommodated by one or more subsequent windows, subsequently attempt allocation of a block to the workgroup by searching the subsequent window, starting at the first memory portion of the subsequent window, for a contiguous block of unallocated memory portions sufficient in total size to accommodate the overflow metric. 14. A memory subsystem as claimed in claim 9 , wherein the fine status array is an array of bits in which each bit corresponds to one memory portion of the shared memory and the value of each bit indicates whether the corresponding memory portion is unallocated or not. 15. A memory subsystem as claimed in claim 12 , wherein the coarse status array is an array of bits in which each bit corresponds to one window of the shared memory and the value of each bit indicates whether the respective window is entirely unallocated or not. 16. A memory subsystem as claimed in claim 14 , wherein the shared memory is further partitioned into a plurality of non-overlapping windows each comprising a plurality of memory portions, wherein the resource allocator is further configured to maintain a coarse status array arranged to indicate, for each non-overlapping window of the shared memory, whether all the memory portions of the non-overlapping window are unallocated, wherein the coarse status array is an array of bits in which each bit corresponds to one window of the shared memory and the value of each bit indicates whether the respective window is entirely unallocated or not and wherein the resource allocator is configured to form each bit of the coarse status array by performing an OR reduction of all of the bits of the fine status array which correspond to memory portions lying in the window corresponding to that bit of the coarse status array. 17. A memory subsystem as claimed in claim 7 , wherein the window length of each window is a power of two. 18. A memory subsystem as claimed in claim 1 , wherein the resource allocator may maintai

Assignees

Inventors

Classifications

  • for multiprocessing or multitasking · CPC title

  • with a shared cache · CPC title

  • User address space allocation, e.g. contiguous or non contiguous base addressing · CPC title

  • Mechanisms to release resources · CPC title

  • G06F9/5016Primary

    the resource being the memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10990448B2 cover?
A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource…
Who is the assignee on this patent?
Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/5016. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).