Execution pipeline data forwarding

US9569214B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9569214-B2
Application numberUS-201213728765-A
CountryUS
Kind codeB2
Filing dateDec 27, 2012
Priority dateDec 27, 2012
Publication dateFeb 14, 2017
Grant dateFeb 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, in an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit is provided. The method includes producing output data with the producing execution subunit, consuming input data with the consuming execution subunit, for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, evaluating those one or more intervening operations to determine whether their execution would compose an identify function, and if the one or more intervening operations would compose such an identity function, controlling the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit.

First claim

Opening claim text (preview).

The invention claimed is: 1. In an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit, the method comprising: producing output data with the producing execution subunit; consuming input data with the consuming execution subunit; for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, evaluating those one or more intervening operations to determine whether their execution would compose an identity function in which the input to the one or more intervening operations is equal to the output of the one or more intervening operations; and if the one or more intervening operations would compose such an identity function, controlling the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit. 2. The method of claim 1 , where the evaluating of the one or more intervening operations includes determining whether the one or more intervening operations include a store instruction followed by a load instruction having a common memory address. 3. The method of claim 2 , where the load instruction immediately follows the store instruction in a next cycle of the execution pipeline. 4. The method of claim 1 , where the output data is forwarded directly to the consuming execution subunit at a point in time prior to the output data being available to a load/store portion of the execution pipeline that is to send the output data to the consuming execution subunit. 5. The method of claim 4 , where the load/store portion of the execution pipeline includes one or more of a store queue and a memory location in a cache, and the output data is forwarded directly to the consuming execution subunit at a point in time prior to the output data being available to a load/store portion of the execution pipeline that is to send the output data to the consuming execution subunit. 6. The method of claim 4 , further comprising: overriding control by a scheduler that would otherwise control the execution pipeline to stall execution by the consuming execution subunit until the output data becomes available for use by the load/store portion of the execution pipeline to send to the consuming execution subunit. 7. The method of claim 4 , further comprising: controlling the bypass network to send the output data received from the producing execution subunit to the load/store portion and the consuming execution subunit in parallel. 8. The method of claim 1 , where the bypass network includes one or more multiplexers operatively coupled to one or more of the execution subunits, and where controlling the bypass network includes providing an input to a select line of the one or more multiplexers to select the consuming execution subunit as a destination to forward the output data. 9. A micro-processing and memory system comprising: an execution pipeline with a plurality of execution subunits, the execution subunits including a producing execution subunit configured to produce output data and a consuming execution subunit configured to consume input data; a bypass network operatively coupled with the execution pipeline, the bypass network being configured to forward data produced by one execution subunit directly to another execution subunit; and forwarding logic configured to, for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, (i) evaluate those one or more intervening operations to determine whether their execution would compose an identity function in which the input to the one or more intervening operations is equal to the output of the one or more intervening operations; and (ii) if the one or more intervening operations would compose such an identity function, control the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit. 10. The micro-processing and memory system of claim 9 , where the evaluation performed by the forwarding logic includes determining whether the one or more intervening operations include a store instruction followed by a load instruction having a common memory address. 11. The micro-processing and memory system of claim 10 , where the load instruction immediately follows the store instruction in a next cycle of the execution pipeline. 12. The micro-processing and memory system of claim 9 , where the output data is forwarded directly to the consuming execution subunit prior to the output data being available to a load/store portion of the execution pipeline that is to send the output data to the consuming execution subunit. 13. The micro-processing and memory system of claim 12 , where the load/store portion of the execution pipeline includes one or more of a store queue and a memory location in a cache, and where the output data is forwarded directly to the consuming execution subunit at a point in time prior to it being available in any of such store queue or memory location. 14. The micro-processing and memory system of claim 12 , where the forwarding logic is configured to control the bypass network to send the output data received from the producing execution subunit to the load/store portion and the consuming execution subunit in parallel. 15. The micro-processing and memory system of claim 9 , where the bypass network includes one or more multiplexers operatively coupled to one or more of the execution subunits, and the forwarding logic is operatively coupled to a select line of the one or more multiplexers to select the consuming execution subunit as a destination to forward the output data. 16. A micro-processing and memory system comprising: an execution pipeline with a plurality of execution subunits, the execution subunits including a producing execution subunit configured to produce output data and a consuming execution subunit configured to consume input data; a bypass network operatively coupled with the execution pipeline, the bypass network being configured to send data produced by one execution subunit directly to another execution subunit; and forwarding logic configured to, for two or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, (i) detect whether those two or more intervening operations include a store instruction followed by a load instruction having a common memory address; and (ii) if the two or more intervening operations do include such a store and load instruction, control the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit. 17. The micro-processing and memory system of claim 16 , where the load instruction immediately follows the store instruction in a next cycle of the execution pipeline. 18. The micro-processing and memory system of claim 16 , where the output data is forwarded directly to the consuming execution subunit at a time prior to the output data being available to a load/store portion of the execution pipeline that includes one or more of a store queue, and a memory location of a cache, and where the output data is forwarded directly to the consuming execution subunit at a point in time prior to it being available in any of such store queue or memory lo

Assignees

Inventors

Classifications

  • G06F9/3826Primary

    Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage · CPC title

  • Pipeline control instructions, e.g. multicycle NOP · CPC title

  • Maintaining memory consistency · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9569214B2 cover?
In one embodiment, in an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit is provided. The method includes producing output data with the producing execution subunit, consuming input data with the consuming execution subunit, for one or more intervening ope…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3826. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).