Initiation of cache flushes and invalidations on graphics processors

US9563561B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9563561-B2
Application numberUS-201313926328-A
CountryUS
Kind codeB2
Filing dateJun 25, 2013
Priority dateJun 25, 2013
Publication dateFeb 7, 2017
Grant dateFeb 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems may provide for receiving, at a graphics processor, a workload from a host processor and using a kernel on the graphics processor to issue a thread group for execution of the workload on the graphics processor. Additionally, one or more coherency messages may be initiated, by the graphics processor, in response to a thread-related condition of one or more caches on the graphics processor. In one example, the thread-related condition is associated with the execution of the workload on the graphics processor and indicates that the one or more caches on the graphics processor are not coherent with a system memory associated with the host processor.

First claim

Opening claim text (preview).

We claim: 1. A system comprising: a host processor; a system memory associated with the host processor; a bus coupled to the host processor; and a graphics processor coupled to the bus, the graphics processor to receive a workload from the host processor and including, a plurality of caches, and a kernel to issue a thread group for execution of the workload on the graphics processor in response to the graphics processor detecting a thread-related condition of one or more of the plurality of caches, wherein the graphics processor is to initiate one or more coherency messages in response to the thread-related condition of one or more of the plurality of caches, and the thread-related condition is to be associated with the execution of the workload on the graphics processor, wherein the thread group contains a plurality of threads and each of the plurality of threads includes a corresponding coherency message, and wherein the graphics processor is to enable cache coherency operations to be initiated at a kernel or sub-kernel level. 2. The system of claim 1 , wherein the thread-related condition indicates that the one or more of the plurality of caches are not coherent with the system memory. 3. The system of claim 1 , wherein the one or more coherency messages are to include one or more of a flush message and an invalidate message. 4. The system of claim 1 , wherein the thread group is to generate the one or more coherency messages based on one or more instructions from the kernel. 5. The system of claim 1 , wherein the graphics processor further includes a barrier module to initiate the one or more coherency messages when each thread in the thread group has encountered a barrier command. 6. The system of claim 5 , wherein the barrier module is to identify the one or more of the plurality of caches based on one or more barrier messages from the thread group and direct the one or more coherency messages to the one or more of the plurality of caches. 7. An apparatus comprising: a graphics processor to receive a workload from a host processor, the graphics processor including, a plurality of caches, and a kernel to issue a thread group for execution of the workload on the graphics processor in response to the graphics processor detecting a thread-related condition of one or more of the plurality of caches, wherein the graphics processor is to initiate one or more coherency messages in response to the thread-related condition of one or more of the plurality of caches, and the thread-related condition is to be associated with the execution of the workload on the graphics processor, wherein the thread group contains a plurality of threads and each of the plurality of threads includes a corresponding coherency message, and wherein the graphics processor is to enable cache coherency operations to be initiated at a kernel or sub-kernel level. 8. The apparatus of claim 7 , wherein the thread-related condition indicates that the one or more of the plurality of caches are not coherent with a system memory associated with the host processor. 9. The apparatus of claim 7 , wherein the one or more coherency messages are to include one or more of a flush message and an invalidate message. 10. The apparatus of claim 7 , wherein the thread group is to generate the one or more coherency messages based on one or more instructions from the kernel. 11. The apparatus of claim 7 , wherein the graphics processor further includes a barrier module to initiate the one or more coherency messages when each thread in the thread group has encountered a barrier command. 12. The apparatus of claim 11 , wherein the barrier module is to identify the one or more of the plurality of caches based on one or more barrier messages from the thread group and direct the one or more coherency messages to the one or more of the plurality of caches. 13. A method comprising: receiving, at a graphics processor, a workload from a host processor; using, in response to the graphics processor detecting a thread-related condition of one or more of the plurality of caches, a kernel on the graphics processor to issue a thread group for execution of the workload on the graphics processor, wherein the thread-related condition is associated with the execution of the workload on the graphics processor; and initiating, by the graphics processor, one or more coherency messages in response to a thread-related condition of one or more caches on the graphics processor, wherein the thread group contains a plurality of threads and each of the plurality of threads includes a corresponding coherency message, and wherein the coherency operations are initiated at a kernel or sub-kernel level. 14. The method of claim 13 , wherein the thread-related condition indicates that the one or more caches on the graphics processor are not coherent with a system memory associated with the host processor. 15. The method of claim 13 , wherein the one or more coherency messages include one or more of a flush message and an invalidate message. 16. The method of claim 13 , further including generating the one or more coherency messages based on one or more instructions from the kernel. 17. The method of claim 13 , wherein the one or more coherency messages are initiated when each thread in the thread group has encountered a barrier command. 18. The method of claim 17 , further including: identifying the one or more caches based on one or more barrier messages from the thread group; and directing the one or more coherency messages to the one or more caches. 19. At least one non-transitory computer readable storage medium comprising a set of instructions which, when executed by a graphics processor, cause a computer to: receive, at the graphics processor, a workload from a host processor; use, in response to the graphics processor detecting a thread-related condition of one or more of the plurality of caches, a kernel on the graphics processor to issue a thread group for execution of the workload on the graphics processor, wherein the thread-related condition is to be associated with the execution of the workload on the graphics processor; and initiate, by the graphics processor, one or more coherency messages in response to a thread-related condition of one or more caches on the graphics processor, wherein the thread group contains a plurality of threads and each of the plurality of threads includes a corresponding coherency message, and wherein the coherency operations are to be initiated at a kernel or sub-kernel level. 20. The at least one non-transitory computer readable storage medium of claim 19 , wherein the thread-related condition indicates that the one or more caches on the graphics processor are not coherent with a system memory associated with the host processor. 21. The at least one non-transitory computer readable storage medium of claim 19 , wherein the one or more coherency messages are to include one or more of a flush message and an invalidate message. 22. The at least one non-transitory computer readable storage medium of claim 19 , wherein the instructions, if executed, cause a computer to generate the one or more coherency messages based on one or more instructions from the kernel. 23. The at least one non-transitory computer readable storage medium of claim 19 , wherein the one or more coherency messages are to be initiated when each thread in the thread group has encountered a barrier command.

Assignees

Inventors

Classifications

  • In image processor or graphics adapter · CPC title

  • with software control, e.g. non-cacheable data · CPC title

  • with cache invalidating means (G06F12/0815 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9563561B2 cover?
Methods and systems may provide for receiving, at a graphics processor, a workload from a host processor and using a kernel on the graphics processor to issue a thread group for execution of the workload on the graphics processor. Additionally, one or more coherency messages may be initiated, by the graphics processor, in response to a thread-related condition of one or more caches on the graph…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0837. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).