Efficient work unit processing in a multicore system

US10540288B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10540288-B2
Application numberUS-201815949692-A
CountryUS
Kind codeB2
Filing dateApr 10, 2018
Priority dateFeb 2, 2018
Publication dateJan 21, 2020
Grant dateJan 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described in which a system having multiple processing units processes a series of work units in a processing pipeline, where some or all of the work units access or manipulate data stored in non-coherent memory. In one example, this disclosure describes a method that includes identifying, prior to completing processing of a first work unit with a processing unit of a processor having multiple processing units, a second work unit that is expected to be processed by the processing unit after the first work unit. The method also includes processing the first work unit, and prefetching, from non-coherent memory, data associated with the second work unit into a second cache segment of the buffer cache, wherein prefetching the data associated with the second work unit occurs concurrently with at least a portion of the processing of the first work unit by the processing unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying, prior to completing processing of a first work unit with a processing unit of a processor having multiple processing units, a second work unit that is expected to be processed by the processing unit after the first work unit, each of the first work unit and the second work unit associated with one or more stream fragments, and each of the first work unit and the second work unit specifying a work unit handler for processing the one or more stream fragments; processing, by the processing unit, the first work unit, wherein processing the first work unit includes accessing first work unit data associated with the first work unit and stored within a first cache segment of a level one (L1) buffer cache for the processing unit and generating, from the first work unit data, modified first work unit data; prefetching, from non-coherent memory, second work unit data associated with the second work unit into a second cache segment of the L1 buffer cache, wherein prefetching the second work unit data associated with the second work unit occurs concurrently with at least a portion of the processing of the first work unit by the processing unit; flushing, by the processing unit and after processing the first work unit, the first cache segment of the L1 buffer cache, wherein flushing the first cache segment includes storing, in the non-coherent memory, the modified first work unit data; generating, by the processing unit, a message indicating that the modified first work unit data can be accessed from the non-coherent memory; processing, by the processing unit, the second work unit, wherein processing the second work unit includes accessing the second work unit data associated with the second work unit prefetched into the second cache segment of the L1 buffer cache and generating, from the second work unit data, modified second work unit data; identifying, by the processing unit and prior to completing processing of the second work unit, a third work unit that is expected to be processed by the processing unit after the second work unit; and prefetching, by the processing unit and from the non-coherent memory, third work unit data associated with the third work unit into the first cache segment of the L1 buffer cache, wherein prefetching the third work unit data associated with the third work unit occurs concurrently with at least a portion of the processing of the second work unit by the processing unit and concurrently with at least a portion of the flushing the first cache segment. 2. The method of claim 1 , wherein each of flushing the first cache segment, prefetching third work unit data associated with the third work unit, and processing the second work unit occur concurrently. 3. The method of claim 1 , wherein generating the message indicating that the modified first work unit data generated by the first work unit can be accessed from the non-coherent memory occurs prior to completion of the flushing of the first cache segment, the method further comprising: delivering, by the processing unit to a second processing unit, the message, wherein delivering the message is gated by completion of the flushing of the first cache segment. 4. The method of claim 1 , wherein generating the message indicating that the modified first work unit data generated by the first work unit can be accessed from the non-coherent memory transfers ownership of at least a portion of non-coherent memory. 5. The method of claim 1 , wherein the message specifies lines of data associated with the third work unit to prefetch. 6. The method of claim 1 , wherein prefetching second work unit data associated with the second work unit includes masking invalid addresses. 7. The method of claim 1 , wherein at least one of the first work unit and the second work unit includes an identifier of a subsequent work unit for further processing the one or more stream fragments upon completion of the work unit. 8. The method of claim 1 , wherein at least one of the first work unit and the second work unit includes one or more fields to store input or output arguments for processing the one or more stream fragments. 9. The method of claim 1 , wherein at least one of the first work unit and the second work unit includes one or more fields to store auxiliary variables to be used when processing the stream fragment. 10. The method of claim 1 , further comprising, prior to processing the first work unit and the second work unit, storing the first work unit and the second work unit in a work unit queue associated with the processing unit, and wherein identifying the second work unit that is expected to be processed by the processing unit after the first work unit comprises identifying the second work based on a position of the second work unit in the work unit queue. 11. The method of claim 1 , further comprising: prefetching, from coherent memory, information including at least one of: header information and state information. 12. The method of claim 1 , wherein each of the first work unit and the second work unit specify one of the processing units for executing the work unit handler. 13. The method of claim 1 , wherein the first cache segment includes a first plurality of logically associated cache lines within the L1 buffer cache, and wherein the second cache segment includes a second plurality of logically associated cache lines within the L1 buffer cache. 14. A device comprising: a plurality of processing units, each of the processing units configured to execute one or more of a plurality of work unit handlers (WU handlers) for processing stream fragments, and wherein each of the processing units include a level one (L1) buffer cache; a memory to store the stream fragments; a plurality of queues configured to hold work units, each of the work units associated with one or more stream fragments, and wherein each of the work units identifies one of the WU handlers for processing the one or more stream fragments; and a load store unit configured to: identify, prior to completion of processing of a first work unit by a first processing unit of the plurality of processing units, a second work unit that is expected to be processed by the first processing unit after the first work unit, wherein the first processing unit processes the first work unit by accessing first work unit data associated with the first work unit in an active segment of the L1 buffer cache included within the first processing unit and generating, from the first work unit data, modified first work unit data, prefetch, from the memory, second work unit data associated with the second work unit into a standby cache segment of the L1 buffer cache included within the first processing unit, wherein prefetching the second work unit data associated with the second work unit occurs concurrently with at least a portion of the processing of the first work unit by the first processing unit, flush, after the processing of the first work unit is complete, the active cache segment of the buffer cache, wherein flushing the active cache segment includes storing, in the memory, the modified first work unit data, and generate a message indicating that the modified first work unit data processed by the first work unit can be accessed from the memory. 15. The device of claim 14 , wherein the memory is non-coherent memory, and wherein the first processing unit is configured to: process the second work unit, wherein processing the second work unit includes accessing the second work unit data associated with the second work unit prefetched into the standby cac

Assignees

Inventors

Classifications

  • Overlapped cache accessing, e.g. pipeline (G06F12/0846 takes precedence) · CPC title

  • Networked environment · CPC title

  • with main memory updating (G06F12/0806 takes precedence) · CPC title

  • Details of cache specific to multiprocessor cache arrangements · CPC title

  • using clearing, invalidating or resetting means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10540288B2 cover?
Techniques are described in which a system having multiple processing units processes a series of work units in a processing pipeline, where some or all of the work units access or manipulate data stored in non-coherent memory. In one example, this disclosure describes a method that includes identifying, prior to completing processing of a first work unit with a processing unit of a processor h…
Who is the assignee on this patent?
Fungible Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0804. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).