Processor and memory communication in a stacked memory system
US-2024411709-A1 · Dec 12, 2024 · US
US10073796B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10073796-B2 |
| Application number | US-201715449401-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 3, 2017 |
| Priority date | Jun 26, 2014 |
| Publication date | Sep 11, 2018 |
| Grant date | Sep 11, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store unit of data to a store buffer or a store block of data to the store buffer. Logic is provided for the store buffer to detect when store blocks are filled, resulting in the data in those store blocks being drained via PCIe posted writes that are written to send blocks in the PIO send memory at addresses defined by the PIO write instructions. Logic is employed for detecting the fill size of packets and when a packet's send blocks have been filled, enabling the packet data to be eligible for egress.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: receiving sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with respective packets stored in memory to a PIO send memory on a network adaptor or fabric interface, the PIO send memory partitioned into a plurality of send contexts; executing the sequences of PIO write instructions as an instruction thread on a processor that supports out of order execution, wherein execution of PIO write instructions cause data to be written to store units in a store buffer, the store units grouped into store blocks, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store units in different store blocks prior to the store blocks being filled; detecting when store blocks are filled; and in response to detecting a store block is filled, draining the data in the store block via a posted write to a buffer in the PIO send memory. 2. The method of claim 1 , wherein the memory employs 64-Byte (64B) cache lines, each store block comprises 64 Bytes of data, and the posted write comprises a 64B PCIe (Peripheral Component Interconnect Express) posted write. 3. The method of claim 1 , wherein the processor comprises a 64-bit processor, and each store unit comprises 64-bits of data that is written from a 64-bit data register in the processor to a store unit using a single instruction. 4. The method of claim 1 , wherein the processor employs write-combining, and wherein execution of out of order PIO write instructions results in data being written to store units within a store block in a non-sequential order. 5. The method of claim 1 , wherein the PIO send memory is partitioned into a plurality of send contexts, each send context organized as a sequence of send blocks, the method further comprising: receiving a sequence of PIO write instructions for writing data for a packet to a plurality of sequential send blocks in a sequential order; and writing the data for the packet to the sequential send blocks in a non-sequential order. 6. The method of claim 5 , further comprising: detecting that all of the plurality of sequential send blocks have been filled with the packet data; and enabling data in the plurality of send blocks to be egressed once all of the plurality of send blocks are filled. 7. A method comprising: receiving sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with packets stored in memory to a PIO send memory on a network adaptor or fabric interface, the PIO write instructions defining locations in memory containing packet data and memory-mapped addresses of send blocks in the PIO send memory to which the packet data are to be written; executing the sequences of PIO write instructions as an instruction thread on a processor that supports out of order execution, wherein execution of PIO write instructions cause data to be written to store blocks in a store buffer, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store blocks in a different order than an order in which the PIO write instructions are received; detecting when store blocks are filled; in response to detecting a store block is filled, draining the data in the store block, using a posted write instruction to write the data to the send block. 8. The method of claim 7 , wherein the PIO write instruction comprises a 512-bit write instruction, and each of a memory cache line, store block, and send block has a size of 64 Bytes. 9. The method of claim 8 , wherein a posted write comprises a 64-Byte (64B) PCIe (Peripheral Component Interconnect Express) posted write. 10. The method of claim 7 , further comprising: partitioning the PIO send memory into a plurality of send contexts; and employing a First-in, First-out (FIFO) storage scheme associated with the plurality of send contexts under which data for a given packet is stored in one or more sequential send blocks, wherein PIO write instructions for writing packet data for multiple packets to the same send context are sequentially grouped in an original FIFO order, and wherein the packet data for the multiple packets are enabled to be written to send blocks in a different order than the original FIFO order. 11. The method of claim 10 , further comprising: detecting that all of the one or more sequential send blocks have been filled with the packet data for a given packet; and enabling data for the given packet to be egressed once all of the plurality of send blocks are filled. 12. An apparatus, comprising: a processor, having a plurality of processor cores supporting out of order execution and including a memory interface and at least one store buffer; and a transmit engine operatively coupled to the processor and including a Programmed Input/Output (PIO) send memory, wherein the processor includes circuitry to, receive sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with packets stored in a memory accessed via the memory interface to the PIO send memory; execute the sequences of PIO write instructions as an instruction thread on a processor core, wherein execution of PIO write instructions cause data to be written to store units in a store buffer, the store units grouped into store blocks, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store units in different store blocks prior to the store blocks being filled; detect when store blocks are filled; and in response to detecting a store block is filled, drain the data in the store block via a posted write sent the transmit engine to be written to a buffer in the PIO send memory. 13. The apparatus of claim 12 , wherein the transmit engine is embedded in a host fabric interface (HFI) and wherein the processor is coupled to the HFI via a PCIe (Peripheral Component Interconnect Express) interface. 14. The apparatus of claim 12 , wherein the memory employs 64-Byte (64B) cache lines, each store block comprises 64 Bytes of data, and the posted write comprises a 64B (Peripheral Component Interconnect Express) PCIe posted write. 15. The apparatus of claim 12 , wherein the processor comprises a 64-bit processor, and each store unit comprises 64-bits of data that is written from a 64-bit data register in the processor to a store unit using a single instruction. 16. The apparatus of claim 12 , wherein the processor employs write-combining, and wherein execution of out of order PIO write instructions results in data being written to store units within a store block in a non-sequential order. 17. The apparatus of claim 12 , wherein the PIO send memory is partitioned into a plurality of send contexts, each send context organized as a sequence of send blocks, and wherein the processor includes further circuitry to: receive a sequence of PIO write instructions for writing data for a packet to a plurality of sequential send blocks in a sequential order; and write the data for the packet to the sequential send blocks in a non-sequential order. 18. The apparatus of claim 17 , wherein the transmit engine includes circuitry to: detect that all of the plurality of sequential send blocks for a send context have been filled with packet data; and enable data in the plurality of send blocks to be egressed once all of the plurality of send blocks are filled. 19. A processor, comprising: a plurality of processor cores supporting out of o
from multiple instruction streams, e.g. multistreaming · CPC title
Bidirectional FIFO, i.e. system allowing data transfer in two directions · CPC title
Maintaining memory consistency · CPC title
Physics · mapped topic
Monitoring of intermediate fill level, i.e. with additional means for monitoring the fill level, e.g. half full flag, almost empty flag · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.