Memory load to load fusing

US10372452B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10372452-B2
Application numberUS-201715615811-A
CountryUS
Kind codeB2
Filing dateJun 6, 2017
Priority dateMar 14, 2017
Publication dateAug 6, 2019
Grant dateAug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

First claim

Opening claim text (preview).

What is claimed is: 1. A method to cascade execution of instructions of a central processing unit (CPU), comprising: reading one of a first data and first instruction stored in a first cache in response to a first memory load instruction of two consecutive memory load instructions; and performing in parallel, one or more of alignment, sign extension, and endian operations on the first data read from the first cache in response to the first memory load instruction, and selecting a memory-load address-forwarded result based on a corrected alignment of the one of the first data and the first instruction read in response to the first memory load instruction to provide a next address for a second memory load instruction of the two consecutive memory load instructions; and reading the corresponding one of a second data and a second instruction in response to the second memory load instruction based on the selected memory-load address-forwarded result. 2. The method of claim 1 , wherein the first memory load instruction comprises a byte-aligned memory address, and wherein the first memory load instruction comprises no sign extension. 3. The method of claim 2 , wherein the first memory load instruction comprises a 4 byte aligned memory address. 4. The method of claim 2 , wherein the second memory load instruction is dependent upon the first memory load instruction to produce an address for the second memory load instruction. 5. The method of claim 1 , wherein the reading of the one of the first data and the first instruction occurs during a first execution cycle, and wherein the alignment, sign extension and/or endian operations on the one of the first data and the first instruction, and the selecting of the memory-load address-forwarded result occurs in a second execution cycle that is immediately subsequent to the first execution cycle. 6. The method of claim 1 , further comprising looking up a translation lookaside buffer and a cache tag array based on the second memory load instruction to determine a stored location of the second data. 7. The method of claim 6 , wherein the stored location of the corresponding one of the second data and the second instruction is one of the first cache and a second cache. 8. The method of claim 7 , further comprising reading one of the first data and the first instruction stored in a respective one of a data cache and an instruction cache. 9. The method of claim 1 , further comprising reading the first data stored in the first cache by one of a LSU and a data prefetching unit. 10. The method of claim 1 , wherein the first instruction is a direct branch control transfer instruction. 11. A central processing unit (CPU), comprising: a load data alignment logic circuit to perform one or more of alignment sign extension and endian operations on one of a first data and a first instruction received from a cache in response to a first memory load instruction of two consecutive memory load instructions; and a selector logic circuit in parallel to the load data alignment logic circuit, the selector logic circuit to temporally perform in parallel with the load data alignment logic circuit a selection of a memory-load address-forwarded result based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second memory load instruction of the two consecutive memory load instructions, the selected memory-load address-forwarded result being used to read second data from the cache in response to the second memory load instruction. 12. The CPU of claim 11 , wherein the first memory load instruction comprises a byte-aligned memory address, and wherein the first memory load instruction comprises no sign extension. 13. The CPU of claim 12 , wherein the first memory load instruction comprises a 4 byte aligned memory address. 14. The CPU of claim 12 , wherein the second memory load instruction is dependent upon the first memory load instruction to produce an address for the second memory load instruction. 15. The CPU of claim 11 , wherein the CPU reads the one of the first data and the first instruction from the cache occurs during a first execution cycle of the CPU, and wherein the alignment, sign extension and/or endian operations performed by the load data alignment logic circuit and the selection of the memory-load address-forwarded result performed by the selector logic circuit occurs in a second execution cycle of the CPU that is immediately subsequent to the first execution cycle of the CPU. 16. The CPU of claim 11 , further comprising a translation lookaside buffer and a cache tag array that determine a stored location of the corresponding one of the second data and the second instruction based on the second memory load instruction. 17. The CPU of claim 16 , wherein the stored location of the corresponding one of the second data and the second instruction is one of the first cache and a second cache. 18. The CPU of claim 17 , further comprising the cache, wherein the cache is one of a data cache and an instruction cache. 19. The CPU of claim 11 , wherein the load data alignment logic circuit and the selector logic circuit are part of one of a load store unit (LSU) and a data prefetching unit. 20. The CPU of claim 11 , wherein the first instruction is a direct branch control transfer instruction.

Assignees

Inventors

Classifications

  • Indirect addressing · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Instruction code · CPC title

  • Electrical coupling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10372452B2 cover?
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the fir…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30043. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).