Software solution for cooperative memory-side and processor-side data prefetching

US9798528B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9798528-B2
Application numberUS-53131306-A
CountryUS
Kind codeB2
Filing dateSep 13, 2006
Priority dateSep 13, 2006
Publication dateOct 24, 2017
Grant dateOct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A solution for cooperative data prefetching that enables software control of a memory-side data prefetch and/or a processor-side data prefetch is provided. In one embodiment, the invention provides a solution for generating an application, in which access to application data for the application is improved (e.g., optimized) in program code for the application. In particular, a push request, for performing a memory-side data prefetch of the application data, and a prefetch request, for performing a processor-side data prefetch, are added to the program code. The memory-side data prefetch results in the application data being copied from a first data store to a second data store that is faster than the first data store while the processor-side data prefetch results in the application data being copied from the second data store to a third data store that is faster than the second data store.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating an application, the method comprising: loading source code for an application into a memory of a computing system, compiling the source code into program code for the application, and during the compilation of the source code: for each reference to application data determined as likely to generate a memory miss, adding at a position in the program code that is several operations prior to the reference a push request into the program code for a memory-side data prefetch to the program code for the referenced application data, wherein the memory-side data prefetch causes a near memory processor (NMP) to copy referenced application data for the application from random access memory to an L2 cache; and thereafter adding a prefetch request for a processor-side data prefetch to the program code at a position in the program code that is a number operations prior to the reference equivalent to a number of operations required to complete the prefetch, wherein the processor-side data prefetch causes a main processor that is different than the NMP to copy the application data from the L2 cache to an L1 cache. 2. The method of claim 1 , further comprising analyzing a memory access pattern for the application to identify at least one of: the application data, a location in the program code for the push request, or a location in the program code for the prefetch request. 3. The method of claim 2 , wherein the analyzing includes determining that a request to access the application data is likely to incur a memory miss. 4. The method of claim 2 , wherein the analyzing includes analyzing at least one of: application data dependencies or application data structure types at compile-time. 5. The method of claim 2 , wherein the analyzing includes analyzing application data access patterns of the application during runtime. 6. The method of claim 1 , further comprising translating source code for the application into the program code. 7. The method of claim 1 , wherein the push request causes a program to be executed by an execution environment separately from the application. 8. The method of claim 7 , wherein the improving further includes defining a custom program for the program. 9. The method of claim 1 , further comprising: performing a set of high level optimizations on the program code prior to the improving; and performing a set of low level optimizations on the program code after the improving. 10. A computer hardware system for generating an application, the system comprising: at least one processor, wherein the at least one processor is configured to load source code for an application into a memory of the computing hardware system, compiling the source code into program code for the application, and during the compilation of the source code: for each reference to application data determined as likely to generate a memory miss, add at a position in the program code that is several operations prior to the reference a push request into the program code for a memory-side data prefetch to program code of the application for the referenced application data, wherein the memory-side data prefetch causes a near memory processor (NMP) to copy referenced application data for the application from random access memory to an L2 cache; and thereafter add a prefetch request for a processor-side data prefetch to the program code at a position in the program code that is a number operations prior to the reference equivalent to a number of operations required to complete the prefetch, wherein the processor-side data prefetch causes a main processor that is different than the NMP to copy the application data from the L2 cache to an L1 cache. 11. The system of claim 10 , wherein the at least one processor is further configured to analyze a memory access pattern for the application to identify at least one of: the application data, a location in the program code for the push request, and a location in the program code for the prefetch request. 12. The system of claim 11 , wherein the analyzing includes determining that a request to access the application data is likely to incur a memory miss. 13. The system of claim 11 , wherein the analyzing includes analyzing at least one of: application data dependencies and application data structure types at compile-time. 14. The system of claim 11 , wherein the system for analyzing includes a system for analyzing application data access patterns of the application during runtime. 15. The system of claim 10 , wherein the at least one processor is further configured to translate source code for the application into the program code. 16. The system of claim 10 , wherein the push request causes a program to be executed by the execution environment separately from the application. 17. The system of claim 10 , wherein the at least one processor is further configured to: perform a set of high level optimizations on the program code prior improving access to the application data; and perform a set of low level optimizations on the program code after the improving access to the application data. 18. A computer program product comprising at least one non-transitory computer-readable storage medium having stored therein computer usable program code for generating an application, which when executed by a computer hardware system, causes the computer hardware system to perform: loading source code for an application into a memory of a computing system, compiling the source code into program code for the application, and during the compilation of the source code: for each reference to application data determined as likely to generate a memory miss, adding at a position in the program code that is several operations prior to the reference a push request into the program code for a memory-side data prefetch to the program code for the referenced application data, wherein the memory-side data prefetch causes a near memory processor (NMP) to copy referenced application data for the application from random access memory to an L2 cache; and thereafter adding a prefetch request for a processor-side data prefetch to the program code at a position in the program code that is a number operations prior to the reference equivalent to a number of operations required to complete the prefetch, wherein the processor-side data prefetch causes a main processor that is different than the NMP to copy the application data from the L2 cache to an L1 cache. 19. The computer program product of claim 18 , wherein the computer hardware system is further configured to perform analyzing a memory access pattern for the application to identify at least one of: the application data, a location in the program code for the push request, and a location in the program code for the prefetch request. 20. The computer program product of claim 18 , wherein the computer hardware system is further configured to perform translating source code for the application into the program code. 21. The computer program product of claim 18 , wherein the push request causes a program to be executed by the execution environment separately from the application. 22. The computer program product of claim 21 , wherein the computer hardware system is further configured to perform defining a custom program for the program.

Assignees

Inventors

Classifications

  • G06F8/4442Primary

    Reducing the number of cache misses; Data prefetching (cache prefetching G06F12/0862) · CPC title

  • Prefetching based on hints or prefetch instructions · CPC title

  • with multilevel cache hierarchies · CPC title

  • Transformation of program code · CPC title

  • with prefetch · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9798528B2 cover?
A solution for cooperative data prefetching that enables software control of a memory-side data prefetch and/or a processor-side data prefetch is provided. In one embodiment, the invention provides a solution for generating an application, in which access to application data for the application is improved (e.g., optimized) in program code for the application. In particular, a push request, for…
Who is the assignee on this patent?
Gao Yaoqing, Cascaval Gheorghe C, Kielstra Allan H, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F8/4442. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).