Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines

US9940134B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9940134-B2
Application numberUS-201213475708-A
CountryUS
Kind codeB2
Filing dateMay 18, 2012
Priority dateMay 20, 2011
Publication dateApr 10, 2018
Grant dateApr 10, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for decentralized resource allocation in an integrated circuit, comprising: receiving a plurality of requests from one or more resource consumers of a plurality of partitionable engines to access a plurality of resources in a given cycle, wherein the resources are spread across the plurality of partitionable engines and are accessed via a global interconnect having a finite number of buses accessible each clock cycle, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and read/write ports into the memory fragments and the register file segments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for accessing a resource corresponding to a respective request. 2. The method of claim 1 , wherein the adder is a parallel adder. 3. The method of claim 1 , wherein a request is not implemented if one of a set of related requests to implement an action is canceled, wherein the set of related requests includes a request for a read/write port and a request for a bus of the global interconnect structure. 4. The method of claim 1 , wherein the global interconnect is a dedicated point-to-point bus. 5. The method of claim 1 , wherein a single resource and resource consumer pair is operable to utilize one of the finite number of buses each clock cycle. 6. The method of claim 1 , wherein the global interconnect comprises a routing matrix operable to allow the one or more resource consumers to access data from any storage location within the plurality of resources. 7. The method of claim 1 , wherein each of the plurality of thread schedulers is further operable to monitor a readiness of instructions and select the one or more resource consumers for executing the instructions. 8. In a microprocessor, a method for decentralized resource allocation, comprising: receiving a plurality of requests from one or more resource consumers of a plurality of partitionable engines to access a plurality of resources in a given cycle, wherein the resources are spread across the plurality of partitionable engines and are accessed via a global interconnect having a finite number of buses accessible each clock cycle, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for accessing a resource corresponding to a respective request. 9. The method of claim 8 , wherein the resources further comprise read/write ports into the memory fragments and the register file segments of each of the partitionable engines. 10. The method of claim 8 , wherein the adder is a parallel adder. 11. The method of claim 8 , wherein a request is not implemented if one of a set of related requests to implement an action is canceled, wherein the set of related requests includes a request for a read/write port and a request for a bus of the global interconnect structure. 12. The method of claim 8 , wherein the global interconnect is a dedicated point-to-point bus. 13. The method of claim 8 , wherein a single resource and resource consumer pair is operable to utilize one of the finite number of buses each clock cycle. 14. The method of claim 8 , wherein the global interconnect comprises a routing matrix operable to allow the one or more resource consumers to access data from any storage location within the plurality of resources. 15. The method of claim 8 , wherein each of the plurality of thread schedulers is further operable to monitor a readiness of instructions and select the one or more resource consumers for executing the instructions. 16. A microprocessor, comprising: a plurality of resources having data for supporting the execution of multiple code sequences; one or more resource consumers of a plurality of partitionable engines to access the plurality of resources in a given cycle wherein the resources are spread across the plurality of partitionable engines; and a global interconnect having a finite number of buses accessible each clock cycle for coupling the one or more resource consumers with the plurality of resources to access the data and execute the multiple code sequences, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for acc

Assignees

Inventors

Classifications

  • Information transfer, e.g. on bus (G06F13/14 takes precedence) · CPC title

  • using buffers · CPC title

  • Dependency mechanisms, e.g. register scoreboarding · CPC title

  • G06F9/06Primary

    using stored programs, i.e. using an internal store of processing equipment to receive or retain programs · CPC title

  • G06F9/3824Primary

    Operand accessing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9940134B2 cover?
A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests …
Who is the assignee on this patent?
Abdallah Mohammad, Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).