State-based queue protocol
US-2021303551-A1 · Sep 30, 2021 · US
US11748174B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748174-B2 |
| Application number | US-201916590490-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 2, 2019 |
| Priority date | Oct 2, 2019 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus for arbitration and access to hardware request ring structures in a concurrent environment. A request ring mechanism is provided including an arbiter, ring overflow guard, request ring, and request ring metadata, each of which is implemented in shared virtual memory (SVM) on a computing platform including a multi-core processor coupled to an offload device having one or more SVM-capable accelerators. Worker threads request to access the request ring to provide job descriptors to be processed by the accelerator(s). A lockless arbiter returns either an index of a slot in which to write a descriptor or information indicating the ring is full to each worker thread. The scheme enables worker threads to write descriptors to slots in the request ring corresponding to the returned indexes without contention from other worker threads. The ring overflow guard prevents valid descriptors from being overwritten before they are taken off the ring by the accelerator(s). The request ring metadata is used indicate a valid/invalid status of the ring entries.
Opening claim text (preview).
What is claimed is: 1. A method comprising: implementing a request ring in a portion of shared virtual memory (SVM) on a computing platform including a multi-core processor and one or more SVM-capable accelerators, the multi-core processor including a bus to which a plurality of processor cores are coupled; receiving a plurality of requests to access the request ring from a plurality of worker threads concurrently executing on cores of the multi-core processor cores; for a request that is received from a worker thread, assigning, via a lockless arbiter without using software locks, an index of an available slot on the request ring and returning the index to the worker thread making the request; implementing an atomic counter in a ring overflow guard; in response to receiving requests from worker threads to access the request ring, for each request, locking the bus, incrementing the atomic counter; unlocking the bus; detecting, via the ring overflow guard, whether the request ring is full by determining whether a value of the atomic counter is greater than a threshold; and when the value of the atomic counter is greater than the threshold, locking the bus; decrementing the atomic counter; unlocking the bus; and returning indicia to the worker thread that the request ring is full; otherwise, when the value of the atomic counter is not greater than the threshold, returning an index of an available slot to the worker thread. 2. The method of claim 1 , wherein the lockless arbiter is implemented using an atomic counter and the plurality of processor cores are coupled to a bus, further, comprising: in response to receiving a request to access the request ring from a worker thread, locking the bus; incrementing the atomic counter and saving a counter value on a stack; unlocking the bus; and returning the counter value as the index returned to the worker thread making the request. 3. The method of claim 1 , further comprising: in conjunction with processing a response that has been returned by the off-load device, locking the bus; decrementing the atomic counter; and unlocking the bus. 4. The method of claim 1 , further comprising implementing request ring metadata to track a current status of each slot on the request ring, wherein the request ring metadata comprises a respective flag for each index on the request ring indicating whether a descriptor stored in the slot at that index is valid or invalid. 5. The method of claim 4 , further comprising: in conjunction with writing a descriptor to an available slot in the request ring, flipping a value of the flag in the request ring metadata associated with the index corresponding to the slot to which the descriptor is written. 6. The method of claim 4 , further comprising: polling, via a polling controller, the request ring metadata to identify indexes of one or more valid slots; and for each of the indexes that are identified, taking a descriptor off the request ring from a slot at the index; and flipping the value of the bit in the request ring metadata associated with the index to switch a status of the slot from valid to invalid. 7. The method of claim 6 , wherein the polling controller is implemented on the offload device and the request ring and request ring metadata are implemented in SVM, further comprising employing Direct Memory Access (DMA) to enable the polling controller to access the request ring and the request ring metadata. 8. The method of claim 6 , wherein the lockless arbiter is implemented in an arbiter block including a ring overflow guard, further comprising: using the ring overflow guard to detect when the ring is full; and providing indicia from the ring overflow guard to the lockless arbiter when the ring is full. 9. A computer system, comprising: a multi-core processor, including: a plurality of processor cores, an interconnect fabric, communicatively coupled to each processor core; a memory controller, communicatively coupled to the interconnect fabric and having one or more memory channels; an input-output (TO) interface, communicatively coupled to the interconnect fabric; system memory comprising one or more memory devices, each communicatively coupled to at least one memory channel of the memory controller; and an offload device including one or more shared virtual memory (SVM)-capable accelerators having one or more functional units, the offload device having an IO interface coupled to the IO interface on the multicore processor via an IO link; wherein the computer system is configured to, implement a portion of system memory as SVM; implement a request ring in the SVM, the request ring including a plurality of slots in which descriptors are stored, each slot having an associated index; concurrently execute a plurality of worker threads on one or more of the processor cores; receive a plurality of requests to access the request ring from the plurality of worker threads; implement a lockless arbiter via execution of software on a processor core that does not employ any software locks; for a request that is received from a worker thread, assign, via the lockless arbiter, an index of an available slot on the request ring and return the index to the worker thread; and write, via the worker thread, a descriptor in a slot of the request ring corresponding to the index. 10. The computer system of claim 9 , wherein the lockless arbiter is implemented using an atomic counter, wherein the interconnect fabric includes a bus to which the plurality of processor cores are coupled, and wherein the computer system is further configured to: in response to receiving a request to access the request ring from a worker thread, lock the bus; increment the atomic counter and save a counter value on a stack; unlock the bus; and return the counter value as the index returned to the worker thread making the request. 11. The computer system of claim 9 , further configured to: implement a ring overflow guard; in response to receiving a request from a worker thread to access the request ring, detect, via the ring overflow guard, whether the request ring is full; and when the request ring is determined to be full, return indicia to the worker thread indicating the request ring is full, otherwise return an index of an available slot to the worker thread. 12. The computer system of claim 11 , wherein the ring overflow guard includes an atomic counter, wherein the interconnect fabric includes a bus to which the plurality of processor cores are coupled, and wherein the computer system is further configured to: for each request to access the request ring received from a worker thread, lock the bus increment the atomic counter; unlock the bus; determine whether a value of the atomic counter is greater than a threshold; when the value of the atomic counter is greater than the threshold, lock the bus; decrement the atomic counter; unlock the bus; and return indicia to the worker thread that the request ring is full. 13. The computer system of claim 12 , wherein the ring overflow guard is further configured to: use an accelerator to process a descriptor, the accelerator producing a response; write the response into a portion of the SVM; in conjunction with processing the response, lock the bus; decrement the atomic counter; and unlock the bus. 14. The computer system of claim 9 , further configured to implement request ring metadata to track a current status of each slot on the request ring, wherein the request ring metadata comprises a respective flag for each index o
Mutual exclusion algorithms · CPC title
to service a request · CPC title
Access to shared memory · CPC title
using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.