Concurrent store and load operations

US9448936B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9448936-B2
Application numberUS-201414154122-A
CountryUS
Kind codeB2
Filing dateJan 13, 2014
Priority dateJan 13, 2014
Publication dateSep 20, 2016
Grant dateSep 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, processors, and methods for efficiently handling concurrent store and load operations within a processor. A processor comprises a load-store unit (LSU) with a banked level-one (L1) data cache. When a store operation is ready to write data to the L1 data cache, the store operation will skip the write to any banks that have a conflict with a concurrent load operation. A partial write of the store operation will be performed to those banks of the L1 data cache that do not have a conflict with a concurrent load operation. For every attempt to write the store operation, a corresponding store mask will be updated to indicate which portions of the store operation were successfully written to the L1 data cache.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a cache comprising a plurality of banks, wherein each bank of the plurality of banks can be accessed independently of other banks; wherein the processor is configured to: detect that a first store operation of data conflicts with a first load operation in a first clock cycle in at least one bank of the cache; perform a first partial write of a first portion of the data to the cache in the first clock cycle, wherein the first portion of the data is less than all of the data; perform the first load operation by reading from the cache in the first clock cycle; delay a second partial write of a second portion of the data to the cache until a subsequent clock cycle, wherein the second portion of the data conflicts with the first load operation; maintain a first mask for each portion of a plurality of portions of the first store operation; and update the first mask to indicate the first portion of the first store operation has been written to the cache in the first clock cycle. 2. The processor as recited in claim 1 , wherein the cache comprises a plurality of cache lines, wherein the first store operation targets a first cache line, wherein the first load operation targets a second cache line, and wherein the first cache line is different than the second cache line. 3. The processor as recited in claim 2 , further comprising a store queue configured to buffer store operations that target locations in the cache, wherein the first store operation is buffered in the store queue until the first store operation is written in its entirety to the cache. 4. The processor as recited in claim 3 , wherein prior to performing the second partial write of the second portion of the first store operation to the first cache line, the processor is configured to: detect that a second load operation is scheduled to read data from the first cache line; merge the second portion of the first store operation in the store queue with the first portion of the first store operation from the first cache line; and provide the merged data for the second load operation. 5. The processor as recited in claim 3 , wherein prior to performing the second partial write of the second portion of the first store operation to the first cache line, the processor is further configured to: detect that a first snoop operation targets the first cache line; merge the second portion of the first store operation in the store queue with the first portion of the first store operation from the first cache line; and provide the merged data for the first snoop operation. 6. The processor as recited in claim 3 , wherein prior to performing the second partial write of the second portion of the first store operation to the first cache line, the processor is further configured to: evict the first cache line from the cache; merge the second portion of the first store operation in the store queue with the first portion of the first store operation from the first cache line; and write back the merged data to a higher level cache. 7. A load-store unit (LSU) comprising: a load queue; a store queue; and a cache, wherein the cache comprises a plurality of cache lines, and wherein each cache line of the plurality of cache lines comprises a plurality of banks; wherein the LSU is configured to: detect a conflict for access to the cache between a first store operation of data and a first load operation in a first clock cycle; responsive to detecting the conflict for access to the cache between the first store operation and the first operation in the first clock cycle: write a first portion of the data to the cache in the first clock cycle, wherein the first portion of the data is less than all of the data; perform the first load operation in the first clock cycle; and delay a second partial write of a second portion of the data to the cache until a subsequent clock cycle; wherein the first portion of the first store operation targets one or more first banks of the cache, wherein the second portion of the first store operation targets one or more second banks of the cache, and wherein the first load operation targets the one or more second banks of the cache; and wherein the LSU comprises a store mask, and wherein the LSU is further configured to update the store mask to indicate the first portion of the first store operation has been written to the cache in the first clock cycle. 8. The LSU as recited in claim 7 , wherein the LSU is further configured to utilize the store mask to determine which portions of the first store operation to write to the cache on a subsequent clock cycle. 9. The LSU as recited in claim 7 , wherein the LSU is further configured to: perform a second load operation in a second clock cycle, wherein the second load operation targets at least one of the one or more first banks of the cache, and wherein the second clock cycle is subsequent to the first clock cycle; and write the second portion of the first store operation to the one or more second banks of the cache in the second clock cycle responsive to determining the second load operation does not target any of the one or more second banks of the cache. 10. The LSU as recited in claim 7 , wherein the first store operation targets a first cache line of the cache, wherein the first load operation targets a second cache line of the cache, and wherein the first cache line is different than the second cache line. 11. The LSU as recited in claim 9 , wherein the first and second portions of the first load operation are the first load operation in its entirety. 12. A method comprising: maintaining a first mask for a first store operation of data, wherein the first store operation is stored in a store queue, wherein the first store operation targets a location of a first cache line of a cache, and wherein the first mask indicates which portions of the first store operation have been written to the first cache line; writing only a first portion of the data to the first cache line in a first clock cycle responsive to detecting a conflict with a first load operation in the first clock cycle for one or more other portions of the first store operation, wherein the first portion of the data is less than all of the data; and updating the first mask to indicate that the first portion has been written to the first cache line. 13. The method as recited in claim 12 , further comprising writing the one or more other portions of the first store operation to the first cache line in a subsequent clock cycle responsive to determining there are no conflicts with concurrent load operations in the subsequent clock cycle. 14. The method as recited in claim 12 , further comprising: writing a second portion of the first store operation to the first cache line in a second clock cycle responsive to determining there are no conflicts between the second portion of the first store operation and any concurrent load operations, wherein the second clock cycle is subsequent to the first clock cycle; updating the first mask to indicate that the second portion has been written to the first cache line; and delaying writing of a third portion of the first store operation to the first cache line in the second clock cycle responsive to determining there is a conflict between the third portion of the first store operation and one or more concurrent load operations during the second clock cycle. 15. The method as recited in claim 14 , further comprising: detecting a second load operation targeting the first cache line prior to writing the first store operation in its

Assignees

Inventors

Classifications

  • Multiple simultaneous or quasi-simultaneous cache accessing · CPC title

  • Cache consistency protocols · CPC title

  • using clearing, invalidating or resetting means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9448936B2 cover?
Systems, processors, and methods for efficiently handling concurrent store and load operations within a processor. A processor comprises a load-store unit (LSU) with a banked level-one (L1) data cache. When a store operation is ready to write data to the L1 data cache, the store operation will skip the write to any banks that have a conflict with a concurrent load operation. A partial write of …
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0844. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).