Parallelized in-place radix sorting

US9823896B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9823896-B2
Application numberUS-201514615599-A
CountryUS
Kind codeB2
Filing dateFeb 6, 2015
Priority dateJan 29, 2014
Publication dateNov 21, 2017
Grant dateNov 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for sorting a data set. Data items each having a first portion and a second portion is stored. The first and second portions are stored separately and each has a separate set of keys. The first portion has a pointer indicating the second portion. At least some of the first set of keys for each data item is stored in a local memory of a first processor. At least one data stripe set is defined with one stripe within each bucket. An in-place partial bucket radix sort is performed on data items within one data stripe set with a first processor using an initial key. Incorrectly sorted data items are grouped into respective incorrect data item groups within each bucket. A radix sort is then performed using the initial radix on the incorrect data item groups. A first level sorted output is produced.

First claim

Opening claim text (preview).

What is claimed is: 1. A data set sorting apparatus, comprising: a multiple processor computing apparatus, each processor in the multiple processor computing apparatus having a respective local memory; an external memory coupled to the multiple processor computing apparatus; a data sorting processor, coupled to the multiple processor computing apparatus and the external memory, the data sorting processor configured to: divide a memory containing a plurality of data items into a plurality of buckets for an in-place radix sort, each bucket being associated with a respective key value; identify, within each bucket of the plurality of buckets, a respective plurality of stripes; define a plurality of stripe sets, each data stripe set comprising one respective stripe within each respective bucket of the plurality of buckets, the plurality of stripe sets comprising at least a first stripe set and a second stripe set that is separate from the first stripe set; store a respective first portion of each data item within the first stripe set into a data storage comprising a local memory of a first processor in the multiple processor computer apparatus; store a respective first portion of each data item within the second stripe set into a data storage comprising a local memory of a second processor in the multiple processor computer apparatus; store a respective second portion of the each data item in the plurality of data items into the external memory, each respective second portion comprising a respective second set of keys and is stored separately from the respective first portion of the each data item, and each respective first portion comprising a respective first set of keys and a pointer indicating the respective second portion; perform, with the first processor, an in-place partial bucket radix sort using an initial key on data items contained within the first stripe set; perform, with the second processor, an in-place partial bucket radix sort using the initial key on data items contained within the second stripe set; group, in each bucket after performing the in-place partial bucket radix sort, incorrectly sorted data items into a respective incorrect data item group within each bucket, the incorrectly sorted data items comprising data items within a first bucket but having a respective key value associated with a different bucket; perform a radix sort using the initial key on the items within the respective incorrect data item group; and produce a first level sorted output comprising the plurality of data items within the data storage sorted according to the initial key. 2. The data sorting apparatus of claim 1 , the data sorting processor further configured to perform, subsequent to production of the first level sorted output, a subsequent in-place radix sort using at least one key within the respective second set of keys within each data item of the first stripe set and the second stripe set. 3. The data sorting apparatus of claim 2 , the data sorting processor further configured to perform the subsequent in-place radix sort with the first processor, and wherein the respective second set of keys within each data item of the first stripe set and the second stripe set are stored within memory external to the first processor and the second processor. 4. The data sorting apparatus of claim 2 , the data sorting processor further configured to: swap, prior to performance of the subsequent in-place radix sort and for each data item within the first stripe set and the second stripe set, at least a portion of the respective first set of keys with at least a portion of the respective second set of keys such that the at least a portion of the respective second set of keys is stored in the local memory of the first processor; and swap, subsequent to performance of the subsequent in-place radix sort and for each data item within the first stripe set and the second stripe set, the at least a portion of the respective first set of keys with the at least a portion of the respective second set of keys such that the at least a portion of the respective second set of keys is stored in the memory external to the first processor and the second processor. 5. The data sorting apparatus of claim 1 , wherein the data sorting apparatus is further configured to define the plurality of buckets by being configured to divide the data storage by defining a plurality of data pointers into the data storage, each data pointer indicating a separation between two adjacent buckets, and wherein the data sorting apparatus further configured to identify each data stripe in each one data stripe set by further: defining a respective stripe head pointer for each respective stripe, each respective stripe head pointer indicating a first element of each respective stripe; and defining a respective stripe tail pointer for each respective stripe, each respective stripe tail pointer indicating a last element of each respective stripe. 6. The data sorting apparatus of claim 1 , wherein data sorting apparatus further configured to: divide, subsequent to performing the radix sort using the initial key on the items within the respective incorrect data item group, at least one bucket within the plurality of buckets into a plurality of second level buckets, each second level bucket being associated with a respective second level key value; identify, within at least one second level bucket, a plurality of respective second level stripes; define at least one second level data stripe set comprising one respective second level stripe within each second level bucket of a respective at least one bucket; and perform, with the first processor within the multiple processor computing apparatus, a partial bucket radix sort on the first level sorted output using a second level key on data items contained within the at least one second level data stripe set. 7. The data sorting apparatus of claim 6 , wherein the data sorting apparatus is further configured to select the at least one second level bucket based upon a number of data elements in the at least one second level bucket relative to a number of data elements in other second level buckets within the plurality of buckets. 8. A computer program product for sorting a data set, the computer program product comprising: a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: dividing a memory containing a plurality of data items into a plurality of buckets for an in-place radix sort, each bucket being associated with a respective key value; identifying, within each bucket of the plurality of buckets, a respective plurality of stripes; defining a plurality of stripe sets, each data stripe set comprising one respective stripe within each respective bucket of the plurality of buckets, the plurality of stripe sets comprising at least a first stripe set and a second stripe set that is separate from the first stripe set; storing a respective first portion of each data item within the first stripe set into a data storage comprising a local memory of a first processor; storing a respective first portion of each data item within the second stripe set into a data storage comprising a local memory of a second processor; storing a respective second portion of the each data item in the plurality of data items into the external memory, each respective second portion comprising a respective second set of keys and is stored separately from the respective first portion of the each data item, and each respective first portion comprising a respective first set of keys and a pointer indicating the respective second portion; performing, with the first processor, an in

Assignees

Inventors

Classifications

  • G06F7/24Primary

    Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title

  • Clustering or classification · CPC title

  • Indexing structures · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9823896B2 cover?
Systems and methods for sorting a data set. Data items each having a first portion and a second portion is stored. The first and second portions are stored separately and each has a separate set of keys. The first portion has a pointer indicating the second portion. At least some of the first set of keys for each data item is stored in a local memory of a first processor. At least one data stri…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F7/24. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).