Optimistic data read
US-9436615-B2 · Sep 6, 2016 · US
US10372452B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10372452-B2 |
| Application number | US-201715615811-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 6, 2017 |
| Priority date | Mar 14, 2017 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.
Opening claim text (preview).
What is claimed is: 1. A method to cascade execution of instructions of a central processing unit (CPU), comprising: reading one of a first data and first instruction stored in a first cache in response to a first memory load instruction of two consecutive memory load instructions; and performing in parallel, one or more of alignment, sign extension, and endian operations on the first data read from the first cache in response to the first memory load instruction, and selecting a memory-load address-forwarded result based on a corrected alignment of the one of the first data and the first instruction read in response to the first memory load instruction to provide a next address for a second memory load instruction of the two consecutive memory load instructions; and reading the corresponding one of a second data and a second instruction in response to the second memory load instruction based on the selected memory-load address-forwarded result. 2. The method of claim 1 , wherein the first memory load instruction comprises a byte-aligned memory address, and wherein the first memory load instruction comprises no sign extension. 3. The method of claim 2 , wherein the first memory load instruction comprises a 4 byte aligned memory address. 4. The method of claim 2 , wherein the second memory load instruction is dependent upon the first memory load instruction to produce an address for the second memory load instruction. 5. The method of claim 1 , wherein the reading of the one of the first data and the first instruction occurs during a first execution cycle, and wherein the alignment, sign extension and/or endian operations on the one of the first data and the first instruction, and the selecting of the memory-load address-forwarded result occurs in a second execution cycle that is immediately subsequent to the first execution cycle. 6. The method of claim 1 , further comprising looking up a translation lookaside buffer and a cache tag array based on the second memory load instruction to determine a stored location of the second data. 7. The method of claim 6 , wherein the stored location of the corresponding one of the second data and the second instruction is one of the first cache and a second cache. 8. The method of claim 7 , further comprising reading one of the first data and the first instruction stored in a respective one of a data cache and an instruction cache. 9. The method of claim 1 , further comprising reading the first data stored in the first cache by one of a LSU and a data prefetching unit. 10. The method of claim 1 , wherein the first instruction is a direct branch control transfer instruction. 11. A central processing unit (CPU), comprising: a load data alignment logic circuit to perform one or more of alignment sign extension and endian operations on one of a first data and a first instruction received from a cache in response to a first memory load instruction of two consecutive memory load instructions; and a selector logic circuit in parallel to the load data alignment logic circuit, the selector logic circuit to temporally perform in parallel with the load data alignment logic circuit a selection of a memory-load address-forwarded result based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second memory load instruction of the two consecutive memory load instructions, the selected memory-load address-forwarded result being used to read second data from the cache in response to the second memory load instruction. 12. The CPU of claim 11 , wherein the first memory load instruction comprises a byte-aligned memory address, and wherein the first memory load instruction comprises no sign extension. 13. The CPU of claim 12 , wherein the first memory load instruction comprises a 4 byte aligned memory address. 14. The CPU of claim 12 , wherein the second memory load instruction is dependent upon the first memory load instruction to produce an address for the second memory load instruction. 15. The CPU of claim 11 , wherein the CPU reads the one of the first data and the first instruction from the cache occurs during a first execution cycle of the CPU, and wherein the alignment, sign extension and/or endian operations performed by the load data alignment logic circuit and the selection of the memory-load address-forwarded result performed by the selector logic circuit occurs in a second execution cycle of the CPU that is immediately subsequent to the first execution cycle of the CPU. 16. The CPU of claim 11 , further comprising a translation lookaside buffer and a cache tag array that determine a stored location of the corresponding one of the second data and the second instruction based on the second memory load instruction. 17. The CPU of claim 16 , wherein the stored location of the corresponding one of the second data and the second instruction is one of the first cache and a second cache. 18. The CPU of claim 17 , further comprising the cache, wherein the cache is one of a data cache and an instruction cache. 19. The CPU of claim 11 , wherein the load data alignment logic circuit and the selector logic circuit are part of one of a load store unit (LSU) and a data prefetching unit. 20. The CPU of claim 11 , wherein the first instruction is a direct branch control transfer instruction.
Indirect addressing · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
Instruction code · CPC title
Electrical coupling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.