Victim cache that supports draining write-miss entries
US-2024264952-A1 · Aug 8, 2024 · US
US2016188337A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016188337-A1 |
| Application number | US-201414583651-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 27, 2014 |
| Priority date | Dec 27, 2014 |
| Publication date | Jun 30, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
Opening claim text (preview).
What is claimed is: 1 . A hardware processor comprising: a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and an execution unit to execute the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 2 . The hardware processor of claim 1 , further comprising a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 3 . The hardware processor of claim 2 , wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 4 . The hardware processor of claim 2 , wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 5 . The hardware processor of claim 1 , wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 6 . The hardware processor of claim 1 , wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 7 . The hardware processor of claim 1 , wherein the execution unit is to load the multidimensional block of elements into a victim cache. 8 . The hardware processor of claim 1 , wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements. 9 . A method comprising: decoding, with a decode unit, a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and executing with an execution unit the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 10 . The method of claim 9 , further comprising providing a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 11 . The method of claim 10 , wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 12 . The method of claim 10 , wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 13 . The method of claim 9 , wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 14 . The method of claim 9 , wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 15 . The method of claim 9 , wherein the execution unit is to load the multidimensional block of elements into a victim cache. 16 . The method of claim 9 , wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements. 17 . An apparatus comprising: a set of one or more processors; and a set of one or more data storage devices that stores code, that when executed by the set of processors causes the set of one or more processors to perform the following: decoding, with a decode unit, a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and executing with an execution unit the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 18 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: further comprising providing a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 19 . The apparatus of claim 18 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 20 . The apparatus of claim 18 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 21 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 22 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 23 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the execution unit is to load the multidimensional block of elements into a victim cache. 24 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements.
Prefetch instructions; cache control instructions · CPC title
Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title
using stride · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
Addressing or accessing the instruction operand or the result {; Formation of operand address; Addressing modes (address translation G06F12/00)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.