Systems, methods, and apparatuses for heterogeneous computing

US11093277B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11093277-B2
Application numberUS-202016913265-A
CountryUS
Kind codeB2
Filing dateJun 26, 2020
Priority dateDec 31, 2016
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a multi-chip package comprising: an interposer substrate; a graphics processor die coupled to the interposer substrate, the graphics processor die comprising: a plurality of data parallel processing circuits to simultaneously perform operations on a plurality of data elements, at least one data parallel processing circuit comprising: local operand storage to store a plurality of source matrix data elements and a plurality of result matrix data elements of one or more source matrices and result matrices, respectively; execution circuitry comprising a plurality of dot-product execution circuits to execute a plurality of dot-product instructions in parallel to perform a corresponding plurality of dot product operations on at least a portion of the plurality of source matrix data elements to generate the plurality of destination matrix data elements, wherein the one or more source matrices comprise a first matrix and wherein a first dot product operation of the simultaneous dot-product operations comprises a dot-product of one or more source matrix data elements from the first matrix and one or more source vector data elements; a first multi-protocol on-chip communication fabric coupled to the data parallel processing circuits; a memory controller coupled to the first multi-protocol on-chip communication fabric; a plurality of memory dies stacked vertically on the interposer substrate; and a plurality of memory channels integrated through the interposer substrate to couple the memory controller to the memory dies. 2. The apparatus of claim 1 wherein the plurality of source matrix data elements comprise floating point data elements. 3. The apparatus of claim 1 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Row-Oriented Sparse Matrix Dense Vector (spMdV) multiplication operation. 4. The apparatus of claim 1 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Scale and Update operation. 5. The apparatus of claim 1 further comprising: virtualization circuitry to map the plurality of data parallel processing circuits to virtual functions and to assign one or more of the virtual functions to one or more virtual machines. 6. The apparatus of claim 5 wherein the virtualization circuitry comprises one or more control registers to store an indication of the mapping between the virtual functions and the plurality of parallel data processing circuits. 7. The apparatus of claim 6 wherein the first multi-protocol on-chip communication fabric comprises an arbiter to implement Quality of Service (QoS) and/or Virtual Channels for communication over the first multi-protocol on-chip communication fabric. 8. The apparatus of claim 1 wherein each memory channel is to interconnect the memory controller to one memory die of the plurality of memory dies. 9. The apparatus of claim 1 further comprising: an interconnect to couple the memory controller to a system memory device, wherein the system memory device is coupled to a host processor. 10. The apparatus of claim 9 further comprising: memory management circuitry to map a shared virtual memory (SVM) space across the system memory device and the memory dies, the SVM space to be shared by the host processor and the graphics processor die, allowing the host processor and graphics processor die to access the system memory die and the memory dies using a consistent set of virtual memory addresses. 11. The apparatus of claim 10 wherein the memory management circuitry comprises: an input-output memory management unit (IOMMU) to provide access by the plurality of data parallel processing circuits to page tables of the host processor. 12. The apparatus of claim 9 wherein the interconnect comprises a Peripheral Component Interconnect Express (PCIe) interconnect. 13. A system comprising: a system memory; a multi-chip package comprising: an interposer substrate; a graphics processor die coupled to the interposer substrate, the graphics processor die comprising: a plurality of data parallel processing circuits to simultaneously perform operations on a plurality of data elements, at least one data parallel processing circuit comprising: local operand storage to store a plurality of source matrix data elements and a plurality of result matrix data elements of one or more source matrices and result matrices, respectively; execution circuitry comprising a plurality of dot-product execution circuits to execute a plurality of dot-product instructions in parallel to perform a corresponding plurality of dot product operations on at least a portion of the plurality of source matrix data elements to generate the plurality of destination matrix data elements, wherein the one or more source matrices comprise a first matrix and wherein a first dot product operation of the simultaneous dot-product operations comprises a dot-product of one or more source matrix data elements from the first matrix and one or more source vector data elements; a first multi-protocol on-chip communication fabric coupled to the data parallel processing circuits; a memory controller coupled to the first multi-protocol on-chip communication fabric; a plurality of memory dies stacked vertically on the interposer substrate; a plurality of memory channels integrated through the interposer substrate to couple the memory controller to the memory dies; and a Peripheral Component Interconnect Express (PCIe) interface coupled to the first multi-protocol on-chip communication fabric, the PCIe interface to couple the graphics processor die to the system memory. 14. The system of claim 13 wherein the plurality of source matrix data elements comprise floating point data elements. 15. The system of claim 13 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Row-Oriented Sparse Matrix Dense Vector (spMdV) multiplication operation. 16. The system of claim 13 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Scale and Update operation. 17. The system of claim 13 further comprising: virtualization circuitry to map the plurality of data parallel processing circuits to virtual functions and to assign one or more of the virtual functions to one or more virtual machines. 18. The system of claim 17 wherein the virtualization circuitry comprises one or more control registers to store an indication of the mapping between the virtual functions and the plurality of parallel data processing circuits. 19. The system of claim 18 wherein the first multi-protocol on-chip communication fabric comprises an arbiter to implement Quality of Service (QoS) and/or Virtual Channels for communication over the first multi-protocol on-chip communication fabric. 20. The system of claim 13 wherein each memory channel is to interconnect the memory controller to one memory die of the plurality of memory dies. 21. The system of claim 13 further comprising: an interconnect to couple the memory controller to a system memory device, wherein the system memory device is coupled to a host processor. 22. The system of claim 21 further comprising: memory management circuitry to map a shared virtual memory (SVM) space across the system memory device and the memory dies, the SVM space to be shared by the host processor

Assignees

Inventors

Classifications

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093277B2 cover?
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).