Efficient multi-context thread distribution

US10452397B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452397-B2
Application numberUS-201715477022-A
CountryUS
Kind codeB2
Filing dateApr 1, 2017
Priority dateApr 1, 2017
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system, allocate a second number of streaming multiprocessors (SMs) to the respective plurality of contexts, and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A graphics multiprocessor comprising: an instruction cache to receive a stream of instructions from a pipeline manager; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of streaming multiprocessors (SMs), each streaming multiprocessor comprising a plurality of graphics processing cores; a shared memory communicatively coupled to the plurality of graphics processing cores; and a processing unit to: determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; allocate a second number of the streaming multiprocessors (SMs) to the respective plurality of contexts based on a ratio of the threads between the plurality of contexts; and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. 2. The graphics multiprocessor of claim 1 , the graphics processing unit to: determine whether one or more of the plurality of contexts have one or more extra threads that do not fit within the second number of streaming multiprocessors allocated to plurality of contexts. 3. The graphics multiprocessor of claim 2 , the graphics processing unit, in response to a determination that one or more of the plurality of contexts have one or more extra threads that do not fit within the second number of streaming multiprocessors allocated to plurality of contexts, is to: implement a process to assign the one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 4. The graphics multiprocessor of claim 3 , the graphics processing unit to: obtain a cache footprint usage parameter for each of the threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; and store the cache footprint usage parameter in a command buffer as a kernel thread meta-data. 5. The graphics multiprocessor of claim 4 , the graphics processing unit to: forward the cache footprint usage parameter to a thread dispatcher. 6. The graphics multiprocessor of claim 5 , the graphics processing unit to: use the cache footprint usage parameter to allocate one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 7. The graphics multiprocessor of claim 1 , wherein the second number of streaming multiprocessors (SMs) are allocated to the respective plurality of contexts based on a ratio of the number of contexts per thread. 8. An electronic device, comprising: a display; an instruction cache to receive a stream of instructions from a pipeline manager; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of streaming multiprocessors (SMs), each streaming multiprocessor comprising a plurality of graphics processing cores; a shared memory communicatively coupled to the plurality of graphics processing cores; and a processing unit to: determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; allocate a second number of the streaming multiprocessors (SMs) to the respective plurality of contexts based on a ratio of the threads between the plurality of contexts; and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. 9. The electronic device of claim 8 , the graphics processing unit to: determine whether one or more of the plurality of contexts have one or more extra threads that do not fit within the second number of streaming multiprocessors allocated to plurality of contexts. 10. The electronic device of claim 9 , the graphics processing unit, in response to a determination that one or more of the plurality of contexts have one or more extra threads that do not fit within the second number of streaming multiprocessors allocated to plurality of contexts, is to: implement a process to assign the one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 11. The electronic device of claim 10 , the graphics processing unit to: obtain a cache footprint usage parameter for each of the threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; and store the cache footprint usage parameter in a command buffer as a kernel thread meta-data. 12. The electronic device of claim 11 , the graphics processing unit to: forward the cache footprint usage parameter to a thread dispatcher. 13. The electronic device of claim 12 , the graphics processing unit to: use the cache footprint usage parameter to allocate one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 14. The electronic device of claim 8 , wherein the second number of streaming multiprocessors (SMs) are allocated to the respective plurality of contexts based on a ratio of the number of contexts per thread. 15. A method comprising: receiving, in an instruction cache, a stream of instructions from a pipeline manager; executing, in an instruction unit, the stream of instructions; determining, in a general purpose graphics processing unit comprising a plurality of streaming multiprocessors (SMs), each streaming multiprocessor comprising a plurality of graphics processing cores, a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; allocating a second number of the streaming multiprocessors (SMs) to the respective plurality of contexts based on a ratio of the threads between the plurality of contexts; and dispatching threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. 16. The method of claim 15 , further comprising: determining whether one or more of the plurality of contexts have one or more extra threads that do not fit within the second number of streaming multiprocessors allocated to plurality of contexts. 17. The method of claim 16 , further comprising implementing a process to assign the one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 18. The method of claim 17 , further comprising: obtaining a cache footprint usage parameter for each of the threads to be scheduled for each context of a plurality of contexts in a multi-context processing system; and storing the cache footprint usage parameter in a command buffer as a kernel thread meta-data. 19. The method of claim 18 , further comprising: forwarding the cache footprint usage parameter to a thread dispatcher. 20. The method of claim 19 , further comprising: using the cache footprint usage parameter to allocate one or more extra threads to one or more streaming multiprocessors (SMs) which are assigned to a different context. 21. The method of claim 15 , wherein the second number of streaming multiprocessors (SMs) are allocated to the respective plurality of contexts based on a ratio of the number of contexts per thread. 22. One or more non-transitory computer-readable medium comprising one or more instructions that when executed on a general purpose graphics p

Assignees

Inventors

Classifications

  • Graphics controllers · CPC title

  • the resource being the memory · CPC title

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Power processing, i.e. workload management for processors involved in display operations, such as CPUs or GPUs · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452397B2 cover?
Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system, allocate a second number of streaming multiprocessors (SMs) to the respective plur…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30123. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).