Hardware apparatuses and methods to prefetch a multidimensional block of elements from a multimensional array

US2016188337A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016188337-A1
Application numberUS-201414583651-A
CountryUS
Kind codeA1
Filing dateDec 27, 2014
Priority dateDec 27, 2014
Publication dateJun 30, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.

First claim

Opening claim text (preview).

What is claimed is: 1 . A hardware processor comprising: a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and an execution unit to execute the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 2 . The hardware processor of claim 1 , further comprising a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 3 . The hardware processor of claim 2 , wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 4 . The hardware processor of claim 2 , wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 5 . The hardware processor of claim 1 , wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 6 . The hardware processor of claim 1 , wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 7 . The hardware processor of claim 1 , wherein the execution unit is to load the multidimensional block of elements into a victim cache. 8 . The hardware processor of claim 1 , wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements. 9 . A method comprising: decoding, with a decode unit, a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and executing with an execution unit the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 10 . The method of claim 9 , further comprising providing a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 11 . The method of claim 10 , wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 12 . The method of claim 10 , wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 13 . The method of claim 9 , wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 14 . The method of claim 9 , wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 15 . The method of claim 9 , wherein the execution unit is to load the multidimensional block of elements into a victim cache. 16 . The method of claim 9 , wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements. 17 . An apparatus comprising: a set of one or more processors; and a set of one or more data storage devices that stores code, that when executed by the set of processors causes the set of one or more processors to perform the following: decoding, with a decode unit, a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements; and executing with an execution unit the prefetch instruction to: generate system memory addresses of the other elements of the multidimensional block of elements; and load the multidimensional block of elements into the cache from the system memory addresses. 18 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: further comprising providing a prefetch unit to generate the system memory addresses of the other elements of the multidimensional block of elements from a state machine. 19 . The apparatus of claim 18 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the prefetch unit further comprises an adder to generate the system memory addresses of the other elements of the multidimensional block of elements. 20 . The apparatus of claim 18 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the prefetch unit further comprises an address generation unit to generate the system memory addresses of the other elements of the multidimensional block of elements. 21 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the at least one operand of the instruction is to indicate a level of the cache to load the multidimensional block of elements. 22 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the stride comprises a first stride in a first dimension and a different, second stride in a second dimension. 23 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the execution unit is to load the multidimensional block of elements into a victim cache. 24 . The apparatus of claim 17 , wherein the set of data storage devices further stores code, that when executed by the set of processors causes the set of processors to perform the following: wherein the execution unit is to replace a speculative prefetch data set in the cache with the multidimensional block of elements.

Assignees

Inventors

Classifications

  • Prefetch instructions; cache control instructions · CPC title

  • Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title

  • using stride · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Addressing or accessing the instruction operand or the result {; Formation of operand address; Addressing modes (address translation G06F12/00)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016188337A1 cover?
Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30047. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).