Apparatus and method for low-latency invocation of accelerators

US10089113B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10089113-B2
Application numberUS-201615282082-A
CountryUS
Kind codeB2
Filing dateSep 30, 2016
Priority dateDec 28, 2012
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a plurality of processors; a first interconnect to communicatively couple two or more of the plurality of processors; a second interconnect to communicatively couple one or more of the plurality of processors to one or more other system components; and a system memory communicatively coupled to one or more of the processors; at least one processor comprising: a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads; at least one shared cache circuit to be shared among two or more of the SMT cores; at least one of the SMT cores comprising: an instruction fetch circuit to fetch instructions of one or more of the threads, an instruction decode circuit to decode the instructions, a register renaming circuit to rename registers of a register file, an instruction cache circuit to store instructions to be executed, a data cache circuit to store data; at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit; a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit, the PCIe circuit to communicatively couple one or more of the SMT cores to an accelerator device, the PCIe circuit to provide the accelerator device access to resources of one or more of the processors including the at least one shared cache circuit; and a memory access circuit to identify an accelerator context save/restore region of the accelerator device in a memory space that stores data from execution of an application, the application invoking the accelerator device during the application's execution, the accelerator context save/restore region pointed by a context save/restore pointer, and the accelerator context save/restore region to store an accelerator context state. 2. The system as in claim 1 wherein the accelerator device is to restore its context state from the context save/restore region. 3. The system as in claim 1 wherein the context save/restore pointer identifies a memory address. 4. The system as in claim 1 further comprising: a register to store the accelerator context save/restore pointer. 5. The system as in claim 1 further comprising: at least one storage device communicatively coupled to one or more of the processors. 6. The system as in claim 1 further comprising: at least one communication device communicatively coupled to one or more of the processors. 7. The system as in claim 1 wherein the system memory comprises a dynamic random access memory. 8. A method comprising: communicatively coupling two or more of a plurality of processors; communicatively coupling one or more of the plurality of processors to one or more other system components; and communicatively coupling a system memory to one or more of the processors; performing out-of-order instruction execution for a plurality of threads on a plurality of simultaneous multithreading (SMT) cores; sharing at least one shared cache among two or more of the SMT cores; fetching instructions of one or more of the threads; decoding the instructions; renaming registers of a register file; storing instructions to be executed in an instruction cache circuit; storing data in a data cache circuit; storing both instructions and data in at least one level 2 (L2) cache circuit communicatively coupled to the instruction cache circuit and the data cache circuit; communicatively coupling one or more of the SMT cores to an accelerator device, wherein the one or more SMT cores are communicatively coupled to the accelerator device through a peripheral component interconnect express (PCIe) circuit; providing the accelerator device access to resources of one or more of the plurality of processors including the at least one shared cache circuit through the PCIe circuit; and identifying an accelerator context save/restore region of the accelerator device in a memory space that stores data from execution of an application, the application invoking the accelerator device during the application's execution, the accelerator context save/restore region pointed by a context save/restore pointer, and the accelerator context save/restore region to store an accelerator context state. 9. The method as in claim 8 wherein the accelerator device is to restore its context state from the context save/restore region. 10. The method as in claim 8 wherein the context save/restore pointer identifies a memory address. 11. The method as in claim 8 further comprising: storing the accelerator context save/restore pointer in a register. 12. The method as in claim 8 further comprising: communicatively coupling at least one storage device to one or more of the processors. 13. The method as in claim 8 further comprising: communicatively coupling at least one communication device to one or more of the processors. 14. The method as in claim 8 wherein the system memory comprises a dynamic random access memory. 15. An apparatus comprising: means for communicatively coupling two or more of a plurality of processors; means for communicatively coupling one or more of the plurality of processors to one or more other system components; and means for communicatively coupling a system memory to one or more of the processors; means for performing out-of-order instruction execution for a plurality of threads on a plurality of simultaneous multithreading (SMT) cores; means for sharing at least one shared cache among two or more of the SMT cores; means for fetching instructions of one or more of the threads; means for decoding the instructions; means for renaming registers of a register file; means for storing instructions to be executed in an instruction cache circuit; means for storing data in a data cache circuit; means for storing both instructions and data in at least one level 2 (L2) cache circuit communicatively coupled to the instruction cache circuit and the data cache circuit; means for communicatively coupling one or more of the SMT cores to an accelerator device, the means for communicatively coupling one or more of the SMT cores to the accelerator device includes a peripheral component interconnect express (PCIe) circuit; means for providing the accelerator device access to resources of the apparatus including the at least one shared cache circuit through the PCIe circuit; and means for identifying an accelerator context save/restore region of the accelerator device in a memory space that stores data from execution of an application, the application invoking the accelerator device during the application's execution, the accelerator context save/restore region pointed by a context save/restore pointer, and the accelerator context save/restore region to store an accelerator context state. 16. The apparatus as in claim 15 wherein the accelerator device is to restore its context state from the context save/restore region. 17. The apparatus as in claim 15 wherein the context save/restore pointer identifies a memory address. 18. The apparatus as in claim 15 further comprising: means for storing the accelerator context save/restore pointer in a register. 19. The apparatus as in claim 15 further comprising: means for communicatively coupling at least one storage device to one or more of the processors. 20. The apparatus as in claim 15 further comprisin

Assignees

Inventors

Classifications

  • Instruction code · CPC title

  • within a central processing unit [CPU] · CPC title

  • to perform miscellaneous control operations, e.g. NOP · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089113B2 cover?
An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30076. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).