Multi-core communication acceleration using hardware queue device

US10929323B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10929323-B2
Application numberUS-201916601137-A
CountryUS
Kind codeB2
Filing dateOct 14, 2019
Priority dateJan 4, 2016
Publication dateFeb 23, 2021
Grant dateFeb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a source register to store data to be enqueued in a queue management device (QMD); a decoder to decode an enqueue instruction, the enqueue instruction comprising a first operand to identify the source register and a second operand to identify a specific buffer of a plurality of buffers of the QMD; and an execution unit to execute the decoded enqueue instruction and to cause the data in the source register to be stored as an enqueue request in the specific buffer of the QMD; wherein the QMD is to select the enqueue request from the specific buffer out of the plurality of buffers in accordance to a scheduling policy and to process the enqueue request by storing the data into an internal storage unit of the QMD. 2. The apparatus of claim 1 , wherein responsive to an execution of the decoded enqueue instruction by the execution unit, a copy of the data is stored in a cache communicatively coupled to and shared by the apparatus and one or more processor cores. 3. The apparatus of claim 1 , wherein at least some of the plurality of buffers of the QMD are first in, first out (FIFO) buffers. 4. The apparatus of claim 1 , wherein the scheduling policy is a Round Robin policy. 5. The apparatus of claim 1 , wherein the scheduling policy is a Weighted Round Robin policy. 6. The apparatus of claim 1 , wherein the scheduling policy is a preemptive priority policy. 7. The apparatus of claim 1 , wherein the internal storage unit of the QMD is configurable to support data of varying lengths and sizes. 8. The apparatus of claim 1 , wherein the data is stored into the internal storage unit with a metadata tag to indicate how the data should be handled by the QMD. 9. The apparatus of claim 1 , wherein the QMD is to limit a number of enqueue requests that may be submitted by the apparatus in accordance to a resource policy. 10. An apparatus comprising: a destination register; a decoder to decode a dequeue instruction, the dequeue instruction comprising a first operand to identify the destination register and a second operand to identify a specific buffer of a plurality of buffers of a queue management device (QMD); and an execution unit to execute the decoded dequeue instruction and to cause a dequeue request to be stored in the specific buffer of the QMD; wherein the QMD is to select the dequeue request from the specific buffer out of the plurality of buffers in accordance to a scheduling policy, the QMD further to process the dequeue request by retrieving data associated with the dequeue request from an internal storage unit of the QMD and storing the retrieved data in the destination register. 11. The apparatus of claim 10 , wherein at least some of the plurality of buffers of the QMD are first in, first out (FIFO) buffers. 12. The apparatus of claim 10 , wherein the scheduling policy is a Round Robin policy. 13. The apparatus of claim 10 , wherein the scheduling policy is a Weighted Round Robin policy. 14. The apparatus of claim 10 , wherein the scheduling policy is a preemptive priority policy. 15. The apparatus of claim 10 , wherein the internal storage unit is configurable to support data of varying lengths and sizes. 16. The apparatus of claim 10 , wherein the data retrieved from the internal storage unit is associated with a metadata tag to indicate how the retrieved data should be handled by the QMD. 17. The apparatus of claim 10 , wherein the QMD is to limit a number of dequeue requests that may be submitted by the apparatus in accordance to a resource policy. 18. A system comprising: a plurality processor cores; a queue management device (QMD) to store and process enqueue and dequeue requests from the plurality of processor cores; and a first processor core of the plurality of processor cores comprising: a source register to store data to be enqueued in the QMD; a decoder to decode an enqueue instruction, the enqueue instruction comprising a first operand to identify the source register and a second operand to identify a specific buffer of a plurality of buffers of the QMD; and an execution unit to execute the decoded enqueue instruction and to cause the data in the source register to be stored as an enqueue request in the specific buffer of the QMD; wherein the QMD is to select the enqueue request from the specific buffer out of the plurality of buffers in accordance to a scheduling policy and to process the enqueue request by storing the data into an internal storage unit of the QMD. 19. The system of claim 18 , further comprising a last level cache (LLC) communicatively coupled to and shared by the plurality of processor cores. 20. The system of claim 19 , wherein responsive to an execution of the decoded enqueue instruction by the execution unit of the first processor core, a copy of the data is stored in the LLC. 21. The system of claim 18 , wherein the decoder of the first processor core is further to decode a dequeue instruction comprising a third operand to identify a destination register of the first processor core and a fourth operand to identify a second specific buffer of the plurality of buffers of the QMD, wherein the execution unit is further to execute the decoded dequeue instruction and to cause a dequeue request to be stored in the second specific buffer of the QMD, and wherein the QMD is to select the dequeue request from the second specific buffer out of the plurality of buffers in accordance to the scheduling policy, and to process the dequeue request by retrieving data associated with the dequeue request from the internal storage unit of the QMD and storing the retrieved data in the destination register. 22. The system of claim 18 , wherein at least some of the plurality of buffers of the QMD are first in, first out (FIFO) buffers. 23. The system of claim 18 , wherein the scheduling policy is a Round Robin policy. 24. The system of claim 18 , wherein the scheduling policy is a Weighted Round Robin policy. 25. The system of claim 18 , wherein the scheduling policy is a preemptive priority policy.

Assignees

Inventors

Classifications

  • G06F13/37Primary

    using a physical-position-dependent priority, e.g. daisy chain, round robin or token passing · CPC title

  • Addressing variable-length words or parts of words · CPC title

  • Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title

  • Speculative instruction execution · CPC title

  • Using a specific cache allocation policy other than replacement policy · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10929323B2 cover?
Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardwar…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F13/37. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).