Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US-2016210145-A1 · Jul 21, 2016 · US
US9940134B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9940134-B2 |
| Application number | US-201213475708-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 18, 2012 |
| Priority date | May 20, 2011 |
| Publication date | Apr 10, 2018 |
| Grant date | Apr 10, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.
Opening claim text (preview).
What is claimed is: 1. A method for decentralized resource allocation in an integrated circuit, comprising: receiving a plurality of requests from one or more resource consumers of a plurality of partitionable engines to access a plurality of resources in a given cycle, wherein the resources are spread across the plurality of partitionable engines and are accessed via a global interconnect having a finite number of buses accessible each clock cycle, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and read/write ports into the memory fragments and the register file segments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for accessing a resource corresponding to a respective request. 2. The method of claim 1 , wherein the adder is a parallel adder. 3. The method of claim 1 , wherein a request is not implemented if one of a set of related requests to implement an action is canceled, wherein the set of related requests includes a request for a read/write port and a request for a bus of the global interconnect structure. 4. The method of claim 1 , wherein the global interconnect is a dedicated point-to-point bus. 5. The method of claim 1 , wherein a single resource and resource consumer pair is operable to utilize one of the finite number of buses each clock cycle. 6. The method of claim 1 , wherein the global interconnect comprises a routing matrix operable to allow the one or more resource consumers to access data from any storage location within the plurality of resources. 7. The method of claim 1 , wherein each of the plurality of thread schedulers is further operable to monitor a readiness of instructions and select the one or more resource consumers for executing the instructions. 8. In a microprocessor, a method for decentralized resource allocation, comprising: receiving a plurality of requests from one or more resource consumers of a plurality of partitionable engines to access a plurality of resources in a given cycle, wherein the resources are spread across the plurality of partitionable engines and are accessed via a global interconnect having a finite number of buses accessible each clock cycle, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for accessing a resource corresponding to a respective request. 9. The method of claim 8 , wherein the resources further comprise read/write ports into the memory fragments and the register file segments of each of the partitionable engines. 10. The method of claim 8 , wherein the adder is a parallel adder. 11. The method of claim 8 , wherein a request is not implemented if one of a set of related requests to implement an action is canceled, wherein the set of related requests includes a request for a read/write port and a request for a bus of the global interconnect structure. 12. The method of claim 8 , wherein the global interconnect is a dedicated point-to-point bus. 13. The method of claim 8 , wherein a single resource and resource consumer pair is operable to utilize one of the finite number of buses each clock cycle. 14. The method of claim 8 , wherein the global interconnect comprises a routing matrix operable to allow the one or more resource consumers to access data from any storage location within the plurality of resources. 15. The method of claim 8 , wherein each of the plurality of thread schedulers is further operable to monitor a readiness of instructions and select the one or more resource consumers for executing the instructions. 16. A microprocessor, comprising: a plurality of resources having data for supporting the execution of multiple code sequences; one or more resource consumers of a plurality of partitionable engines to access the plurality of resources in a given cycle wherein the resources are spread across the plurality of partitionable engines; and a global interconnect having a finite number of buses accessible each clock cycle for coupling the one or more resource consumers with the plurality of resources to access the data and execute the multiple code sequences, wherein the resources comprise at least one of register file segments and memory fragments of each of the partitionable engines, and wherein the resource consumers comprise at least one of execution units or address calculation units of each of the partitionable engines and wherein each of a plurality of thread schedulers are operable to identify requested resources and contend for one or more bus of said global interconnect to schedule the plurality of resources for transfer through the global interconnect to said one or more resource consumers, and wherein the plurality of resources are transferred to the one or more resource consumers by: at each resource, adding a number of requests for access to the each resource using an adder, wherein the requests for access are made using the plurality of thread schedulers; at the each resource, comparing the number of requests against a threshold limiter; at the each resource, canceling one or more requests that exceeds the threshold limiter, wherein canceled requests are queued and given priority in a subsequent cycle; at the each resource, implementing requests that are not canceled within a current clock cycle, wherein a sum at an output of the adder represents a port number for acc
Information transfer, e.g. on bus (G06F13/14 takes precedence) · CPC title
using buffers · CPC title
Dependency mechanisms, e.g. register scoreboarding · CPC title
using stored programs, i.e. using an internal store of processing equipment to receive or retain programs · CPC title
Operand accessing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.