No-locality hint vector memory access processors, methods, systems, and instructions
US-2016019184-A1 · Jan 21, 2016 · US
US2017262211A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017262211-A1 |
| Application number | US-201615065111-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 9, 2016 |
| Priority date | Mar 9, 2016 |
| Publication date | Sep 14, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems and computer program products for accelerating sorting of data are provided herein. A computer-implemented method includes retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements, scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions, forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins, writing the bin cache line from the corresponding one of the plurality of bins to a memory, and loading the bin cache line from the memory to the input buffer.
Opening claim text (preview).
1 . A computer-implemented method, comprising: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory; and loading the bin cache line from the memory to the input buffer; wherein the steps are carried out by at least one computing device. 2 . The computer-implemented method of claim 1 , wherein the one or more vector instructions comprise single instruction multiple data (SIMD) instructions. 3 . The computer-implemented method of claim 1 , wherein said writing the bin cache line to the memory comprises using one or more predicated instructions. 4 . The computer-implemented method of claim 3 , wherein the one or more predicated instructions are predicated based on the occupied size of the bin relative to the size of the bin cache line. 5 . The computer-implemented method of claim 1 , wherein said loading the bin cache line from the memory to the input buffer is performed using one or more predicated instructions. 6 . The computer-implemented method of claim 5 , further comprising maintaining (i) a read pointer and (ii) a write pointer corresponding to cache line aligned positions in the bin, wherein the one or more predicated instructions are predicated based on the location in the memory of the read pointer relative to the location in the memory of the write pointer. 7 . The computer-implemented method of claim 1 , wherein (i) said writing and (ii) said loading are performed without using data-dependent branches. 8 . The computer-implemented method of claim 1 , wherein said loading the bin cache line from the memory to the input buffer is performed when the input buffer is empty. 9 . The computer-implemented method of claim 1 , further comprising maintaining (i) at least one read pointer and (ii) at least one write pointer for each of the plurality of bins. 10 . The computer-implemented method of claim 9 , wherein a given one of the read pointers corresponds to the location in the memory from where the bin cache line is loaded to the input buffer. 11 . The computer-implemented method of claim 10 , wherein the given read pointer is aligned within boundaries of the bin. 12 . The computer-implemented method of claim 9 , wherein a given one of the write pointers corresponds to a location in the memory wherein the bin cache line is written. 13 . The computer-implemented method of claim 12 , wherein the given write pointer is aligned within boundaries of the bin. 14 . The computer-implemented method of claim 9 , wherein the loading of the bin cache line from the memory to the input buffer is performed when the location in the memory of a given read pointer corresponding to the bin cache line is lower than the location in the memory of a given write pointer corresponding to the bin cache line. 15 . The computer-implemented method of claim 1 , wherein said boundaries of the bin cache lines are offset from boundaries of their corresponding bins. 16 . The computer-implemented method of claim 1 , wherein said writing the bin cache line to the memory is performed when the occupied size of the bin is greater than the size of the bin cache line. 17 . A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: retrieve a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scatter the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; form a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; write the bin cache line from the corresponding one of the plurality of bins to a memory; and load the bin cache line from the memory to the input buffer. 18 . A system comprising: a memory; and at least one processor coupled to the memory and configured for: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory; and loading the bin cache line from the memory to the input buffer. 19 . The system of claim 18 , wherein (i) said writing the bin cache line to the memory and (ii) said loading the bin cache line from the memory to the input buffer comprise using one or more predicated instructions. 20 . A computer-implemented method, comprising: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory using a first set of one or more predicated instructions, wherein the first set of predicated instructions are predicated based on the occupied size of the bin being greater than the size of the bin cache line; maintaining (i) a read pointer and (ii) a write pointer corresponding to cache line aligned positions in the bin; and loading the bin cache line from the memory to the input buffer using a second set of one or more predicated instructions, wherein the second set of predicated instructions are predicated based on the location in the memory of the read pointer being lower than the location in the memory of the write pointer; wherein the steps are carried out by at least one computing device.
Prefetching based on hints or prefetch instructions · CPC title
with prefetch · CPC title
with main memory updating (G06F12/0806 takes precedence) · CPC title
to perform conditional operations, e.g. using predicates or guards · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.