In-Memory/Register Vector Radix Sort

US2017262211A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017262211-A1
Application numberUS-201615065111-A
CountryUS
Kind codeA1
Filing dateMar 9, 2016
Priority dateMar 9, 2016
Publication dateSep 14, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems and computer program products for accelerating sorting of data are provided herein. A computer-implemented method includes retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements, scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions, forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins, writing the bin cache line from the corresponding one of the plurality of bins to a memory, and loading the bin cache line from the memory to the input buffer.

First claim

Opening claim text (preview).

1 . A computer-implemented method, comprising: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory; and loading the bin cache line from the memory to the input buffer; wherein the steps are carried out by at least one computing device. 2 . The computer-implemented method of claim 1 , wherein the one or more vector instructions comprise single instruction multiple data (SIMD) instructions. 3 . The computer-implemented method of claim 1 , wherein said writing the bin cache line to the memory comprises using one or more predicated instructions. 4 . The computer-implemented method of claim 3 , wherein the one or more predicated instructions are predicated based on the occupied size of the bin relative to the size of the bin cache line. 5 . The computer-implemented method of claim 1 , wherein said loading the bin cache line from the memory to the input buffer is performed using one or more predicated instructions. 6 . The computer-implemented method of claim 5 , further comprising maintaining (i) a read pointer and (ii) a write pointer corresponding to cache line aligned positions in the bin, wherein the one or more predicated instructions are predicated based on the location in the memory of the read pointer relative to the location in the memory of the write pointer. 7 . The computer-implemented method of claim 1 , wherein (i) said writing and (ii) said loading are performed without using data-dependent branches. 8 . The computer-implemented method of claim 1 , wherein said loading the bin cache line from the memory to the input buffer is performed when the input buffer is empty. 9 . The computer-implemented method of claim 1 , further comprising maintaining (i) at least one read pointer and (ii) at least one write pointer for each of the plurality of bins. 10 . The computer-implemented method of claim 9 , wherein a given one of the read pointers corresponds to the location in the memory from where the bin cache line is loaded to the input buffer. 11 . The computer-implemented method of claim 10 , wherein the given read pointer is aligned within boundaries of the bin. 12 . The computer-implemented method of claim 9 , wherein a given one of the write pointers corresponds to a location in the memory wherein the bin cache line is written. 13 . The computer-implemented method of claim 12 , wherein the given write pointer is aligned within boundaries of the bin. 14 . The computer-implemented method of claim 9 , wherein the loading of the bin cache line from the memory to the input buffer is performed when the location in the memory of a given read pointer corresponding to the bin cache line is lower than the location in the memory of a given write pointer corresponding to the bin cache line. 15 . The computer-implemented method of claim 1 , wherein said boundaries of the bin cache lines are offset from boundaries of their corresponding bins. 16 . The computer-implemented method of claim 1 , wherein said writing the bin cache line to the memory is performed when the occupied size of the bin is greater than the size of the bin cache line. 17 . A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: retrieve a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scatter the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; form a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; write the bin cache line from the corresponding one of the plurality of bins to a memory; and load the bin cache line from the memory to the input buffer. 18 . A system comprising: a memory; and at least one processor coupled to the memory and configured for: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory; and loading the bin cache line from the memory to the input buffer. 19 . The system of claim 18 , wherein (i) said writing the bin cache line to the memory and (ii) said loading the bin cache line from the memory to the input buffer comprise using one or more predicated instructions. 20 . A computer-implemented method, comprising: retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements; scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions; forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins; writing the bin cache line from the corresponding one of the plurality of bins to a memory using a first set of one or more predicated instructions, wherein the first set of predicated instructions are predicated based on the occupied size of the bin being greater than the size of the bin cache line; maintaining (i) a read pointer and (ii) a write pointer corresponding to cache line aligned positions in the bin; and loading the bin cache line from the memory to the input buffer using a second set of one or more predicated instructions, wherein the second set of predicated instructions are predicated based on the location in the memory of the read pointer being lower than the location in the memory of the write pointer; wherein the steps are carried out by at least one computing device.

Assignees

Inventors

Classifications

  • Prefetching based on hints or prefetch instructions · CPC title

  • with prefetch · CPC title

  • with main memory updating (G06F12/0806 takes precedence) · CPC title

  • to perform conditional operations, e.g. using predicates or guards · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017262211A1 cover?
Methods, systems and computer program products for accelerating sorting of data are provided herein. A computer-implemented method includes retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements, scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises usin…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/30072. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).