What technology area does this patent fall under?

Primary CPC classification G06F9/30038. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems, methods, and apparatuses for heterogeneous computing

US11093277B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11093277-B2
Application number	US-202016913265-A
Country	US
Kind code	B2
Filing date	Jun 26, 2020
Priority date	Dec 31, 2016
Publication date	Aug 17, 2021
Grant date	Aug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a multi-chip package comprising: an interposer substrate; a graphics processor die coupled to the interposer substrate, the graphics processor die comprising: a plurality of data parallel processing circuits to simultaneously perform operations on a plurality of data elements, at least one data parallel processing circuit comprising: local operand storage to store a plurality of source matrix data elements and a plurality of result matrix data elements of one or more source matrices and result matrices, respectively; execution circuitry comprising a plurality of dot-product execution circuits to execute a plurality of dot-product instructions in parallel to perform a corresponding plurality of dot product operations on at least a portion of the plurality of source matrix data elements to generate the plurality of destination matrix data elements, wherein the one or more source matrices comprise a first matrix and wherein a first dot product operation of the simultaneous dot-product operations comprises a dot-product of one or more source matrix data elements from the first matrix and one or more source vector data elements; a first multi-protocol on-chip communication fabric coupled to the data parallel processing circuits; a memory controller coupled to the first multi-protocol on-chip communication fabric; a plurality of memory dies stacked vertically on the interposer substrate; and a plurality of memory channels integrated through the interposer substrate to couple the memory controller to the memory dies. 2. The apparatus of claim 1 wherein the plurality of source matrix data elements comprise floating point data elements. 3. The apparatus of claim 1 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Row-Oriented Sparse Matrix Dense Vector (spMdV) multiplication operation. 4. The apparatus of claim 1 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Scale and Update operation. 5. The apparatus of claim 1 further comprising: virtualization circuitry to map the plurality of data parallel processing circuits to virtual functions and to assign one or more of the virtual functions to one or more virtual machines. 6. The apparatus of claim 5 wherein the virtualization circuitry comprises one or more control registers to store an indication of the mapping between the virtual functions and the plurality of parallel data processing circuits. 7. The apparatus of claim 6 wherein the first multi-protocol on-chip communication fabric comprises an arbiter to implement Quality of Service (QoS) and/or Virtual Channels for communication over the first multi-protocol on-chip communication fabric. 8. The apparatus of claim 1 wherein each memory channel is to interconnect the memory controller to one memory die of the plurality of memory dies. 9. The apparatus of claim 1 further comprising: an interconnect to couple the memory controller to a system memory device, wherein the system memory device is coupled to a host processor. 10. The apparatus of claim 9 further comprising: memory management circuitry to map a shared virtual memory (SVM) space across the system memory device and the memory dies, the SVM space to be shared by the host processor and the graphics processor die, allowing the host processor and graphics processor die to access the system memory die and the memory dies using a consistent set of virtual memory addresses. 11. The apparatus of claim 10 wherein the memory management circuitry comprises: an input-output memory management unit (IOMMU) to provide access by the plurality of data parallel processing circuits to page tables of the host processor. 12. The apparatus of claim 9 wherein the interconnect comprises a Peripheral Component Interconnect Express (PCIe) interconnect. 13. A system comprising: a system memory; a multi-chip package comprising: an interposer substrate; a graphics processor die coupled to the interposer substrate, the graphics processor die comprising: a plurality of data parallel processing circuits to simultaneously perform operations on a plurality of data elements, at least one data parallel processing circuit comprising: local operand storage to store a plurality of source matrix data elements and a plurality of result matrix data elements of one or more source matrices and result matrices, respectively; execution circuitry comprising a plurality of dot-product execution circuits to execute a plurality of dot-product instructions in parallel to perform a corresponding plurality of dot product operations on at least a portion of the plurality of source matrix data elements to generate the plurality of destination matrix data elements, wherein the one or more source matrices comprise a first matrix and wherein a first dot product operation of the simultaneous dot-product operations comprises a dot-product of one or more source matrix data elements from the first matrix and one or more source vector data elements; a first multi-protocol on-chip communication fabric coupled to the data parallel processing circuits; a memory controller coupled to the first multi-protocol on-chip communication fabric; a plurality of memory dies stacked vertically on the interposer substrate; a plurality of memory channels integrated through the interposer substrate to couple the memory controller to the memory dies; and a Peripheral Component Interconnect Express (PCIe) interface coupled to the first multi-protocol on-chip communication fabric, the PCIe interface to couple the graphics processor die to the system memory. 14. The system of claim 13 wherein the plurality of source matrix data elements comprise floating point data elements. 15. The system of claim 13 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Row-Oriented Sparse Matrix Dense Vector (spMdV) multiplication operation. 16. The system of claim 13 wherein the first matrix comprises a sparse matrix and wherein the first dot-product operation is to enable a Scale and Update operation. 17. The system of claim 13 further comprising: virtualization circuitry to map the plurality of data parallel processing circuits to virtual functions and to assign one or more of the virtual functions to one or more virtual machines. 18. The system of claim 17 wherein the virtualization circuitry comprises one or more control registers to store an indication of the mapping between the virtual functions and the plurality of parallel data processing circuits. 19. The system of claim 18 wherein the first multi-protocol on-chip communication fabric comprises an arbiter to implement Quality of Service (QoS) and/or Virtual Channels for communication over the first multi-protocol on-chip communication fabric. 20. The system of claim 13 wherein each memory channel is to interconnect the memory controller to one memory die of the plurality of memory dies. 21. The system of claim 13 further comprising: an interconnect to couple the memory controller to a system memory device, wherein the system memory device is coupled to a host processor. 22. The system of claim 21 further comprising: memory management circuitry to map a shared virtual memory (SVM) space across the system memory device and the memory dies, the SVM space to be shared by the host processor

Assignees

Intel Corp

Classifications

Y02D10/00
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
G06F9/3836
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
G06F9/5027
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
G06F9/45504
Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators · CPC title

Patent family

Related publications grouped by family.

View patent family 62709975

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093277B2 cover?: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein …
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/30038. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Enhanced application request based scheduling on heterogeneous elements of information technology infrastructure

Integrated heterogeneous processing units

Layout of transmission vias for memory device

Control Groups for Network Testing

Processors having heterogeneous cores with different instructions and/or architecural features that are presented to software as homogeneous virtual cores

Frequently asked questions