Parallelized in-place radix sorting

US9824111B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9824111-B2
Application numberUS-201414582337-A
CountryUS
Kind codeB2
Filing dateDec 24, 2014
Priority dateJan 29, 2014
Publication dateNov 21, 2017
Grant dateNov 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for sorting a data set. A data storage is divided into a plurality of buckets that is each associated with a respective key value. A plurality of stripes is identified in each bucket. At least one data stripe set is defined that has one stripe within each respective bucket. An in-place partial bucket radix sort is performed on data items contained within one data stripe set with a first processor using an initial radix. Incorrectly sorted data items are then grouped in each bucket into a respective incorrect data item group within each bucket. A radix sort is then performed using the initial radix on the items within the respective incorrect data item group. A first level sorted output is produced.

First claim

Opening claim text (preview).

What is claimed is: 1. A data set sorting apparatus, comprising: a multiple processor computing apparatus; a memory coupled to the multiple processor computing apparatus; a data sorting processor, coupled to the multiple processor computing apparatus and the memory, the data sorting processor configured to: divide a plurality of data items stored in the memory into a plurality of buckets, each bucket being associated with a respective key value; identify, within each bucket of the plurality of buckets, a respective plurality of stripes; define a plurality of data stripe sets, each data stripe set in the plurality of data stripe sets comprising one respective stripe within each respective bucket of the plurality of buckets, the plurality of data stripe sets comprising at least a first data stripe set and a second data stripe set that is separate from the first data stripe set; perform, with a first processor within the multiple processor computing apparatus, a first in-place partial bucket radix sort using an initial radix value on data items contained within the first data stripe set; perform, with a second processor within the multiple processor computing apparatus while performing the first in-place partial bucket radix sort, a second in-place partial bucket radix sort using the initial radix value on data items contained within the second data stripe set, the second processor being different from the first processor; group, in each bucket after performing the first in-place partial bucket radix sort and the second in-place partial bucket radix sort, incorrectly sorted data items into a respective incorrect data item group within each bucket, the incorrectly sorted data items comprising data items within a first bucket but having a respective radix value associated with a different bucket; perform a radix sort using the initial radix value on the items within the respective incorrect data item group; and produce a first level sorted output comprising the plurality of data items within the memory sorted according to the initial radix value. 2. The apparatus of claim 1 , wherein the data sorting apparatus further configured to divide the memory by defining a plurality of data pointers into the memory, each data pointer indicating a separation between two adjacent buckets, and wherein the data sorting apparatus further configured to define the plurality of data stripe sets by further: defining a respective stripe head pointer for each respective stripe, each respective stripe head pointer indicating a first element of each respective stripe; and defining a respective stripe tail pointer for each respective stripe, each respective stripe tail pointer indicating a last element of each respective stripe. 3. The apparatus of claim 1 , wherein each stripe in a particular data stripe set contains an equal number of data items. 4. The apparatus of claim 1 , wherein at least one stripe within a data stripe set contains a first number of data elements unequal to a second number of data elements within stripes of the data stripe set that are other than the at least one stripe. 5. The apparatus of claim 4 , wherein the first number of data elements and the second number of data elements are based upon value distribution characteristics of the plurality of data items. 6. The apparatus of claim 1 , wherein the data sorting apparatus is further configured to: divide, subsequent to performing the radix sort using the initial radix value on the items within the respective incorrect data item group, at least one bucket within the plurality of buckets into a plurality of second level buckets, each second level bucket being associated with a respective second level key value; identify, within at least one second level bucket, a plurality of respective second level stripes; define at least one second level data stripe set comprising one respective second level stripe within each second level bucket of a respective at least one bucket; and perform, with the first processor within the multiple processor computing apparatus, a partial bucket radix sort on the first level sorted output using a second level radix on data items contained within the at least one second level data stripe set. 7. The apparatus of claim 6 , wherein the data sorting apparatus is further configured to select the at least one second level bucket based upon a number of data elements in the at least one second level bucket relative to a number of data elements in other second level buckets within the plurality of buckets. 8. The apparatus of claim 1 , wherein at least a portion of the first data stripe is stored a local cache memory of the first processor while the first processor performs the first in-place partial bucket radix sort, and wherein at least a portion of the second data stripe is stored a local cache memory of the second processor while the second processor performs the second in-place partial bucket radix sort. 9. A computer program product for sorting a data set, the computer program product comprising: a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: dividing a plurality of data items stored in a memory into a plurality of buckets, each bucket being associated with a respective key value; identifying, within each bucket of the plurality of buckets, a respective plurality of stripes; defining a plurality of data stripe sets, each data stripe set in the plurality of data stripe sets comprising one respective stripe within each respective bucket of the plurality of buckets, the plurality of data stripe sets comprising at least a first data stripe set and a second data stripe set that is separate from the first data stripe set; performing, with a first processor, a first in-place partial bucket radix sort using an initial radix value on data items contained within one data stripe set within the first data stripe set; performing, with a second processor that is different from the first processor while performing the first in-place partial bucket radix sort, a second in-place partial bucket radix sort using the initial radix value on data items contained within the second data stripe set, the second processor being different from the first processor; grouping, in each bucket after performing the first in-place partial bucket radix sort and the second in-place partial bucket radix sort, incorrectly sorted data items into a respective incorrect data item group within each bucket, the incorrectly sorted data items comprising data items within a first bucket but having a respective radix value associated with a different bucket; performing a radix sort using the initial radix value on the items within the respective incorrect data item group; and producing a first level sorted output comprising the plurality of data items within the memory sorted according to the initial radix value. 10. The computer program product according to claim 9 , wherein the dividing comprises defining a plurality of data pointers into the memory, each data pointer indicating a separation between two adjacent buckets, and wherein the defining the plurality of data stripe sets comprises: defining a respective stripe head pointer for each respective stripe, each respective stripe head pointer indicating a first element of each respective stripe; and defining a respective stripe tail pointer for each respective stripe, each respective stripe tail pointer indicating a last element of each respective stripe. 11. The computer program product according to claim 9 , wherein the method further comprises: dividing, subsequent to perfor

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9824111B2 cover?
Systems and methods for sorting a data set. A data storage is divided into a plurality of buckets that is each associated with a respective key value. A plurality of stripes is identified in each bucket. At least one data stripe set is defined that has one stripe within each respective bucket. An in-place partial bucket radix sort is performed on data items contained within one data stripe set …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/23. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).