Runtime address disambiguation in acceleration hardware

US10474375B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10474375-B2
Application numberUS-201615396049-A
CountryUS
Kind codeB2
Filing dateDec 30, 2016
Priority dateDec 30, 2016
Publication dateNov 12, 2019
Grant dateNov 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit comprising: a processor to execute instructions of a program and to interact with a memory; acceleration hardware to execute a sub-program corresponding to the instructions; a set of input queues coupled to the acceleration hardware and to the memory, the set of input queues comprising: a store address queue to receive, directly from the acceleration hardware, a first address of the memory, the first address associated with a store operation, and a store data queue to receive, directly from the acceleration hardware, first data to be stored at the first address of the memory; a completion queue to buffer response data for a load operation; and a disambiguator circuit coupled to the set of input queues and the completion queue, wherein the load operation comprises an indicator field that when set to a first value causes a query of the disambiguator circuit and when set to a second value does not cause the query of the disambiguator circuit, and the disambiguator circuit is to, responsive to the indicator field of the load operation being set to the first value, determine the load operation that succeeds the store operation has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation, wherein to succeed the store operation, the load operation is to semantically follow the store operation according to a semantical order of the sub-program. 2. The integrated circuit of claim 1 , further comprising an operations queue coupled to the set of input queues, the operations queue to buffer address arguments for the store operation and the load operation, comprising: a channel of the set of input queues at which to retrieve the first address for the store operation; and a second channel of the set of input queues at which to retrieve the first address for the load operation. 3. The integrated circuit of claim 1 , further comprising a scheduler circuit coupled to the set of input queues and to a memory interface, the scheduler circuit to: schedule issuance of the store operation upon receipt of the first address; and trigger generation of a dependency token to indicate, to the load operation, a dependency on the first data stored by the store operation. 4. The integrated circuit of claim 3 , further comprising an execution circuit coupled to the scheduler circuit, to the set of input queues, and to the memory, the execution circuit to, upon detecting reception of the first data in the store data queue, issue the store operation to the memory with the dependency token, to store the first data at the first address. 5. The integrated circuit of claim 3 , wherein the disambiguator circuit comprises a content-addressable memory (CAM), and the scheduler circuit is further to store an entry in the CAM, the entry comprising the first address and a pointer into the store data queue to a location at which to receive the first data. 6. The integrated circuit of claim 5 , wherein the disambiguator circuit is further to stall in response to the CAM filling up with entries for scheduled store operations. 7. The integrated circuit of claim 5 , wherein the scheduler circuit is further to, upon receipt of the first data in the store data queue, store the first data in the CAM in association with the entry, and wherein the disambiguator circuit is to retain the first data in a disambiguator queue of the disambiguator circuit, the first data to be forwarded to a subsequent memory operation that succeeds the load operation. 8. The integrated circuit of claim 5 , wherein the set of input queues further comprises a load address queue to receive, from the acceleration hardware, the first address for the load operation, and wherein the scheduler circuit is further to: detect, based on a search of the CAM, an address conflict between the load operation and the store operation; retrieve the pointer from the CAM; and annotate an indexed slot of the completion queue with the location in the pointer, to schedule issuance of the load operation upon receipt of the data at the location of the store data queue. 9. The integrated circuit of claim 1 , wherein one of the set of input queues and the completion queue is a ring buffer. 10. A memory ordering circuit comprising: a memory interface coupled to a memory, the memory to store data corresponding to instructions being executed for a program; an operations queue coupled to the memory interface, the operations queue to buffer memory operations corresponding to the instructions; a set of input queues coupled to the memory interface and to acceleration hardware, which is to execute a sub-program corresponding to the instructions, the set of input queues comprising: a store address queue to receive, from the acceleration hardware, a first address of the memory for a store operation of the memory operations, and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address in completion of the store operation; a disambiguator circuit coupled to the set of input queues and to the acceleration hardware, the disambiguator circuit including a disambiguator queue, wherein each of the memory operations comprises an indicator field that when set to a first value causes a respective query of the disambiguator circuit and when set to a second value does not cause the respective query of the disambiguator circuit; and an operations manager circuit coupled to the set of input queues and the disambiguator circuit, the operations manager circuit to: schedule the store operation, which resides in the operations queue, to issue to the memory upon receipt of the first address; and store an entry in the disambiguator queue, the entry comprising the first address and a pointer into the store data queue to a location at which to receive the first data for the store operation, wherein locations within the store data queue correspond to out-of-order receipt of data from the acceleration hardware. 11. The memory ordering circuit of claim 10 , wherein the disambiguator circuit comprises one of a disambiguation content-addressable memory (CAM) or a counting Bloom filter. 12. The memory ordering circuit of claim 10 , wherein the set of input queues further comprises a load address queue to receive a second address, from the acceleration hardware, for a succeeding load operation, wherein the operation manager is further to: detect, based on a search requested of the disambiguator circuit, no address conflict between the store operation and the succeeding load operation; and issue the load operation to the memory without waiting for completion of the store operation. 13. The memory ordering circuit of claim 10 , wherein the set of input queues further comprises a load address queue to receive the first address, from the acceleration hardware, for a load operation that succeeds the store operation, further comprising: a completion queue coupled to the operations manager circuit and to the memory, the completion queue to enqueue data received for completion of the load operation; and a dependency queue coupled to the acceleration hardware and to receive, from the acceleration hardware, a dependency token associated with the first address for the load operation, the dependency token indicating a dependency on the first data to be stored by the store operation. 14. The memory ordering circuit of claim 13 , wherein the completion queue is a ring buffer. 15. The memory ordering circuit of claim 13 , wherein the operations manager circuit is furth

Assignees

Inventors

Classifications

  • Accessing, addressing or allocating within memory systems or architectures (digital input from, or digital output to record carriers, e.g. to disk storage units, G06F3/06) · CPC title

  • G06F9/3826Primary

    Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Maintaining memory consistency · CPC title

  • for access to memory bus (G06F13/28 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10474375B2 cover?
An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, fro…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3826. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).