Sending packets using optimized PIO write sequences without SFENCES

US10073796B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10073796-B2
Application numberUS-201715449401-A
CountryUS
Kind codeB2
Filing dateMar 3, 2017
Priority dateJun 26, 2014
Publication dateSep 11, 2018
Grant dateSep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store unit of data to a store buffer or a store block of data to the store buffer. Logic is provided for the store buffer to detect when store blocks are filled, resulting in the data in those store blocks being drained via PCIe posted writes that are written to send blocks in the PIO send memory at addresses defined by the PIO write instructions. Logic is employed for detecting the fill size of packets and when a packet's send blocks have been filled, enabling the packet data to be eligible for egress.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with respective packets stored in memory to a PIO send memory on a network adaptor or fabric interface, the PIO send memory partitioned into a plurality of send contexts; executing the sequences of PIO write instructions as an instruction thread on a processor that supports out of order execution, wherein execution of PIO write instructions cause data to be written to store units in a store buffer, the store units grouped into store blocks, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store units in different store blocks prior to the store blocks being filled; detecting when store blocks are filled; and in response to detecting a store block is filled, draining the data in the store block via a posted write to a buffer in the PIO send memory. 2. The method of claim 1 , wherein the memory employs 64-Byte (64B) cache lines, each store block comprises 64 Bytes of data, and the posted write comprises a 64B PCIe (Peripheral Component Interconnect Express) posted write. 3. The method of claim 1 , wherein the processor comprises a 64-bit processor, and each store unit comprises 64-bits of data that is written from a 64-bit data register in the processor to a store unit using a single instruction. 4. The method of claim 1 , wherein the processor employs write-combining, and wherein execution of out of order PIO write instructions results in data being written to store units within a store block in a non-sequential order. 5. The method of claim 1 , wherein the PIO send memory is partitioned into a plurality of send contexts, each send context organized as a sequence of send blocks, the method further comprising: receiving a sequence of PIO write instructions for writing data for a packet to a plurality of sequential send blocks in a sequential order; and writing the data for the packet to the sequential send blocks in a non-sequential order. 6. The method of claim 5 , further comprising: detecting that all of the plurality of sequential send blocks have been filled with the packet data; and enabling data in the plurality of send blocks to be egressed once all of the plurality of send blocks are filled. 7. A method comprising: receiving sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with packets stored in memory to a PIO send memory on a network adaptor or fabric interface, the PIO write instructions defining locations in memory containing packet data and memory-mapped addresses of send blocks in the PIO send memory to which the packet data are to be written; executing the sequences of PIO write instructions as an instruction thread on a processor that supports out of order execution, wherein execution of PIO write instructions cause data to be written to store blocks in a store buffer, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store blocks in a different order than an order in which the PIO write instructions are received; detecting when store blocks are filled; in response to detecting a store block is filled, draining the data in the store block, using a posted write instruction to write the data to the send block. 8. The method of claim 7 , wherein the PIO write instruction comprises a 512-bit write instruction, and each of a memory cache line, store block, and send block has a size of 64 Bytes. 9. The method of claim 8 , wherein a posted write comprises a 64-Byte (64B) PCIe (Peripheral Component Interconnect Express) posted write. 10. The method of claim 7 , further comprising: partitioning the PIO send memory into a plurality of send contexts; and employing a First-in, First-out (FIFO) storage scheme associated with the plurality of send contexts under which data for a given packet is stored in one or more sequential send blocks, wherein PIO write instructions for writing packet data for multiple packets to the same send context are sequentially grouped in an original FIFO order, and wherein the packet data for the multiple packets are enabled to be written to send blocks in a different order than the original FIFO order. 11. The method of claim 10 , further comprising: detecting that all of the one or more sequential send blocks have been filled with the packet data for a given packet; and enabling data for the given packet to be egressed once all of the plurality of send blocks are filled. 12. An apparatus, comprising: a processor, having a plurality of processor cores supporting out of order execution and including a memory interface and at least one store buffer; and a transmit engine operatively coupled to the processor and including a Programmed Input/Output (PIO) send memory, wherein the processor includes circuitry to, receive sequences of Programmed Input/Output (PIO) write instructions to write packet data associated with packets stored in a memory accessed via the memory interface to the PIO send memory; execute the sequences of PIO write instructions as an instruction thread on a processor core, wherein execution of PIO write instructions cause data to be written to store units in a store buffer, the store units grouped into store blocks, wherein a portion of the PIO write instructions are executed out of order resulting in data being written to store units in different store blocks prior to the store blocks being filled; detect when store blocks are filled; and in response to detecting a store block is filled, drain the data in the store block via a posted write sent the transmit engine to be written to a buffer in the PIO send memory. 13. The apparatus of claim 12 , wherein the transmit engine is embedded in a host fabric interface (HFI) and wherein the processor is coupled to the HFI via a PCIe (Peripheral Component Interconnect Express) interface. 14. The apparatus of claim 12 , wherein the memory employs 64-Byte (64B) cache lines, each store block comprises 64 Bytes of data, and the posted write comprises a 64B (Peripheral Component Interconnect Express) PCIe posted write. 15. The apparatus of claim 12 , wherein the processor comprises a 64-bit processor, and each store unit comprises 64-bits of data that is written from a 64-bit data register in the processor to a store unit using a single instruction. 16. The apparatus of claim 12 , wherein the processor employs write-combining, and wherein execution of out of order PIO write instructions results in data being written to store units within a store block in a non-sequential order. 17. The apparatus of claim 12 , wherein the PIO send memory is partitioned into a plurality of send contexts, each send context organized as a sequence of send blocks, and wherein the processor includes further circuitry to: receive a sequence of PIO write instructions for writing data for a packet to a plurality of sequential send blocks in a sequential order; and write the data for the packet to the sequential send blocks in a non-sequential order. 18. The apparatus of claim 17 , wherein the transmit engine includes circuitry to: detect that all of the plurality of sequential send blocks for a send context have been filled with packet data; and enable data in the plurality of send blocks to be egressed once all of the plurality of send blocks are filled. 19. A processor, comprising: a plurality of processor cores supporting out of o

Assignees

Inventors

Classifications

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • Bidirectional FIFO, i.e. system allowing data transfer in two directions · CPC title

  • Maintaining memory consistency · CPC title

  • Physics · mapped topic

  • Monitoring of intermediate fill level, i.e. with additional means for monitoring the fill level, e.g. half full flag, almost empty flag · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10073796B2 cover?
Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F13/1673. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).