Multi-purpose interface for configuration data and user fabric data
US-10833679-B2 · Nov 10, 2020 · US
US12294368B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12294368-B2 |
| Application number | US-202117485119-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 24, 2021 |
| Priority date | Sep 24, 2021 |
| Publication date | May 6, 2025 |
| Grant date | May 6, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure is directed to 3-D stacked architecture for Programmable Fabrics and Central Processing Units (CPUs). The 3-D stacked orientation enables reconfigurability of the fabric, and allows the fabric to function using coarse-grained and fine-grained acceleration for offloading CPU processing. Additionally, the programmable fabric may be able to function to interface with multiple other compute chiplet components in the 3-D stacked orientation. This enables multiple compute components to communicate without the need for offloading the data communications between the compute chiplets.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a processor having one or more cores; and a programmable fabric device, wherein the processor is stacked in a three-dimensional orientation above the programmable fabric device, and wherein the programmable fabric device comprises: a programmable fabric comprising a plurality of partitions configured to perform fine-grained acceleration operations; and one or more interfaces configured to provide connections between the programmable fabric and the processor, wherein the programmable fabric device is operable to: receive one or more sets of data from a processor pipeline via the one or more interfaces; configure a first portion of the programmable fabric comprising the plurality of partitions coupled to one or more executions units of the one or more cores of the processor to perform the fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the processor to initiate a custom opcode space to interface with the programmable fabric; receive one or more additional sets of data from the processor pipeline; and configure a second portion of the programmable fabric comprising one or more system memory portions reserved for the programmable fabric to interface with the processor to perform coarse-grained acceleration operations. 2. The system of claim 1 , wherein the fine-grained acceleration operations comprise performing operations that read and write data to and from a register file, an L1 cache, an L2 cache, or any other per-core cache of the processor. 3. The system of claim 2 , wherein the one or more interfaces comprise one or more ports for sections of the register file, the L1 cache, the L2 cache, or any other per-core cache of the processor. 4. The system of claim 1 , wherein the one or more interfaces comprise a three-dimensional integrated circuit face-to-face die stacking packaging-based interface. 5. The system of claim 1 , wherein a number of the plurality of partitions corresponds to at least the number of the one or more cores of the processor. 6. The system of claim 1 , wherein the fine-grained acceleration and the coarse-grained acceleration operations are performed concurrently by configuring the first portion of the programmable fabric for the fine-grained acceleration operations and configuring the second portion of the programmable fabric for the coarse-grained acceleration operations. 7. The system of claim 1 , wherein the coarse-grained acceleration operations comprise performing operations using one or more compute express link (CXL) devices that utilize a shared memory component with the processor. 8. The system of claim 1 , wherein the one or more interfaces comprise one or more input/outputs (I/Os) of the programmable fabric, one or more external general-purpose input/outputs (GPIOs), or both. 9. The system of claim 1 , wherein a workload architecture of the processor leverages the custom opcode space to define a set of custom instructions to perform the fine-grained acceleration operations. 10. The system of claim 9 , wherein the set of custom instructions are leveraged by a compiler, curated libraries, or both. 11. The system of claim 1 , where in the programmable fabric device comprises a field-programmable gate array (FPGA). 12. A method of data transfer between a processor stacked in a three-dimensional orientation above a programmable fabric device comprising: receiving, via the processor, one or more sets of data from a processor pipeline via one or more interfaces; configuring, via the processor, a first portion of a programmable fabric of the programmable fabric device comprising a plurality of partitions coupled to one or more execution units of one or more cores of the processor to perform fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the processor to initiate a custom opcode space to interface with the programmable fabric; receiving, via the processor, one or more additional sets of data from the processor pipeline; and configuring, via the processor, a second portion of the programmable fabric comprising one or more system memory portions reserved for the programmable fabric to interface with the processor to perform coarse-grained acceleration operations on the one or more additional sets of data. 13. The method of claim 12 , comprising performing, via the processor, the fine-grained acceleration and the coarse-grained acceleration operations concurrently by configuring the first portion of the programmable fabric for the fine-grained acceleration operations and configuring the second portion of the programmable fabric for the coarse-grained acceleration operations concurrently. 14. The method of claim 12 , comprising performing, via the processor, the coarse-grained acceleration operations with one or more external general-purpose input/outputs (GPIOs) operable to provide an interface for the programmable fabric with the processor to perform the coarse-grained acceleration operations. 15. The method of claim 12 , wherein the coarse-grained acceleration operations comprise performing operations using one or more compute express link (CXL) devices that utilize a shared memory component with the processor. 16. The method of claim 12 , wherein performing the fine-grained acceleration operations comprises performing operations that read and write data to and from a register file, an L1 cache, an L2 cache, or any other per core cache of the processor. 17. A system comprising: one or more compute chiplets; a programmable fabric base die, wherein the one or more compute chiplets are stacked in a three-dimensional orientation above the programmable fabric base die, and wherein the programmable fabric base die comprises one or more interfaces configured to provide connections between the programmable fabric base die and the one or more compute chiplets, wherein the programmable fabric base die is operable to: enable data transfer between the one or more compute chiplets that are three-dimensionally stacked above the programmable fabric base die; and receive, via the one or more compute chiplets, one or more sets of data via the one or more interfaces; configure, via the one or more compute chiplets, a first portion of the programmable fabric base die comprising a plurality of partitions coupled to one or more portions of the one or more compute chiplets to perform fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the one or more compute chiplets to initiate a custom opcode space to interface with the programmable fabric base die; receive, via the one or more compute chiplets, one or more additional sets of data via the one or more interfaces; and configure, via the one or more compute chiplets, a second portion of the programmable fabric base die comprising one or more system memory portions reserved for the programmable fabric base die to interface with the one or more compute chiplets to perform coarse-grained acceleration operations on the one or more additional sets of data. 18. The system of claim 17 , wherein the one or more compute chiplets comprises one or more Central Processing Unit (CPU) chiplets, one or more Graphical Processing Unit (GPU) chiplets, one or more Dual accelerator (DL) chiplets, or any combination thereof. 19. The system of claim 17 , wherein the one or more com
for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD] · CPC title
for speeding up configuration or reconfiguration · CPC title
using semiconductor devices (H03K19/173 takes precedence; wherein the semiconductor devices are only diode rectifiers H03K19/12) · CPC title
Reconfigurable logic embedded in CPU, e.g. reconfigurable unit · CPC title
Reconfigurable logic implemented as a co-processor (instruction execution using a coprocessor G06F9/3877) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.