Dynamic instruction latency management in SIMD machines

US10540260B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10540260-B2
Application numberUS-201815903393-A
CountryUS
Kind codeB2
Filing dateFeb 23, 2018
Priority dateFeb 23, 2018
Publication dateJan 21, 2020
Grant dateJan 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one example, an apparatus comprises processing circuitry to analyze a program at compile time to determine a set of latency parameters associated with instruction sets implemented to execute the program and select a latency management technique based at least in part on the set of latency parameters associated with instruction sets implemented to execute the program. Other examples may be described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a general purpose graphic processor, comprising: an instruction cache to receive a stream of instructions generated by a graphics program; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of graphics compute units; a plurality of registers coupled to the plurality of graphics processing compute units; a hardware unit to selectively enable the instruction unit to execute the stream of instructions out of order; and a processor communicatively coupled to the general purpose graphics processor to: select a first hardware thread count to be applied to the plurality of graphics compute units to execute the stream of instructions; analyze the graphics program to be executed on the general purpose graphics processor at compile time to determine a set of latency parameters associated with the execution of the stream of instructions at the first hardware thread count; select a latency management technique to be implemented by the general purpose graphics processor based at least in part on the set of latency parameters associated with instruction sets implemented to execute the program; determine the set of latency parameters associated with instruction sets implemented to execute the program at a current hardware thread count; determine a composite latency score for the program from the set of latency parameters at the current thread count; and compare the composite latency score at the current hardware thread count to a threshold. 2. The apparatus of claim 1 , the processor to: generate an instruction for the general purpose graphics processor to select a second hardware thread count, greater than the first hardware thread count. 3. The apparatus of claim 1 , the processor to: set the hardware thread count for execution of the program to the current hardware thread when the composite latency score falls beneath the threshold. 4. The apparatus of claim 1 , the processor to: generate an instruction for the general purpose graphics processor to enable out-of-order instruction scheduling for hardware executing the program. 5. An electronic device, comprising: a computer readable memory: a graphics processor comprising; an instruction cache to receive a stream of instructions generated by a graphics program; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of graphics compute units; a plurality of registers coupled to the plurality of graphics processing compute units; a hardware unit to selectively enable the instruction unit to execute the stream of instructions out of order; and a processor communicatively coupled to the general purpose graphics processor to: select a first hardware thread count to be applied to the plurality of graphics compute units to execute the stream of instructions; analyze the graphics program to be executed on the general purpose graphics processor at compile time to determine a set of latency parameters associated with the execution of the stream of instructions at the first hardware thread count; and select a latency management technique to be implemented by the general purpose graphics processor based at least in part on the set of latency parameters associated with instruction sets implemented to execute the program: determine the set of latency parameters associated with instruction count; determine a composite latency score for the program from the set of latency parameters at the current thread count; and compare the composite latency score at the current hardware thread count to a threshold. 6. The electronic device of claim 5 further comprising processing circuitry to: generate an instruction for the general purpose graphics processor to select a second hardware thread count, greater than the first hardware thread count. 7. The electronic device of claim 6 the processor to: implement an alternate latency management technique in response to a determination that the program latency cannot be adequately addressed by adjusting a thread count allocated to the instruction sets implemented to execute the program. 8. The electronic device of claim 7 , the processor to: generate an instruction for the general purpose graphics processor to enable out-of-order instruction scheduling for hardware executing the program. 9. The electronic device of claim 5 , the processor to: set the hardware thread count for execution of the program to the current hardware thread when the composite latency score falls beneath the threshold.

Assignees

Inventors

Classifications

  • Reducing the execution time required by the program code · CPC title

  • Register allocation; Assignment of physical memory space to logical memory space · CPC title

  • by tracing the execution of the program · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Software pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10540260B2 cover?
In one example, an apparatus comprises processing circuitry to analyze a program at compile time to determine a set of latency parameters associated with instruction sets implemented to execute the program and select a latency management technique based at least in part on the set of latency parameters associated with instruction sets implemented to execute the program. Other examples may be de…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/3636. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).