Sorting multiple records of data using ranges of key values

US9213782B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9213782-B2
Application numberUS-201414188746-A
CountryUS
Kind codeB2
Filing dateFeb 25, 2014
Priority dateJun 23, 2010
Publication dateDec 15, 2015
Grant dateDec 15, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for sorting data of an input file containing multiple records associated with multiple tables of a database. The multiple records include key values. The key values are segmented into ranges of key values for each table. Each range of key values for each table is a segment having a segment value. Multiple key values are selected for the multiple records. A block number, which contains a unique permutation of the segment values of the segments, is generated. The segment values denote the ranges of key values encompassing the multiple key values in each record. A sort key value for each record is ascertained, based on the generated block number for each record, and added to each record. The multiple records are sorted according to the sort key values in the multiple records. The sorted multiple records are stored in an output file.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for sorting data of an input file stored on a first tangible storage device, said input file comprising multiple records associated with multiple tables of a database, each record of the multiple records comprising a plurality of key values, said method comprising: segmenting, by a processor of a computer system, the plurality of key values in the multiple records associated with each table into ranges of key values for each table, each range of key values for each table denoted as a segment having an associated segment value; said processor generating, for each record of the multiple records, a block number denoting a unique permutation of the segment values of the segments, said segment values respectively denoting the ranges of key values encompassing multiple key values selected for each record in association with the tables of the multiple tables; said processor ascertaining, for each record of the multiple records, a sort key value based on the generated block number for each record of the multiple records; said processor sorting the multiple records according to the sort key values after adding the sort key value to each record of the multiple records; and said processor storing the sorted multiple records in an output file on a second tangible storage device; wherein the generated block numbers collectively constitute multiple block numbers, wherein the method further comprises sequencing the block numbers of the multiple block numbers in a block sequence such that the segment value differs in only one position within the unique permutation of the segment values in each pair of successive blocks in the block sequence, and wherein said ascertaining the sort key value for each record of the multiple records comprises: converting the generated block number for each record of the multiple records to an ordinal value denoting a sequential position of the generated block number within the block sequence; determining an intra-block key position, within the unique permutation of the segment values of the generated block for each record of the multiple records, as being said only one position at which the segment value differs from the segment value in the block immediately preceding the generated block in the block sequence; determining an intra-block key value as being the key value of the multiple key values of each record of the multiple records at the segment associated with the intra-block key position; and generating the sort key value for each record of the multiple records from a combination of the ordinal value and the intra-block key value. 2. The method of claim 1 , wherein said segmenting comprises: providing a dedicated buffer pool for each table of the multiple tables; computing a number of storable records for each table as equal to a size of the dedicated buffer pool for each table divided by a record length of each record of each table; computing a total number of segments for each table as equal to a total number of records of each table divided by the computed number of storable records for each table; after said computing the total number of segments for each table, selecting the range of key values for the segments of each table in a manner that uniformly distributes the plurality of key values among the segments of each table; and computing a segment value for each segment for each table as being equal to a product divided by a divisor, said computed segment value rounded down to a next lower integer if the computed segment value is not an integer, said product being a product of the calculated number of segments for each table and a difference between a highest key value in each segment and a lowest key value from the ranges of key values, said divisor being an increment between successive key values plus a difference between a highest key value from the ranges of key values and the lowest key value from the ranges of key values. 3. The method of claim 1 , wherein the selected multiple key values include all key values that satisfy a condition for determining that the data in the multiple tables are readable sequentially. 4. The method of claim 3 , wherein the condition is: a total number of records in the input file/(number of segments 1×number of segments 2× . . . ×number of segments n)>max(number of records in table i/number of segments i)/coefficient; wherein n is a total number of tables of the multiple tables; wherein the multiple tables are denoted as table 1, table 2, . . . , table n; wherein said number of segments i denotes the total number of segments associated with each table i (i=1, 2, . . . , n); wherein max(number of records in table i/number of segments i) denotes a maximum value of (number of records in table i/number of segments i) over i=1, 2, . . . , n; wherein coefficient is a number of pages from a page read immediately before a currently read page is handled as page-sequential; and wherein page denotes a unit of storing data on the first tangible storage device. 5. The method of claim 1 , wherein the method comprises: said processor deleting the sort key value for each record of the multiple records during or after said sorting, which results in the sort key value for each record of the multiple records not being included in the sorted multiple records in the output file. 6. The method of claim 1 , wherein each key value of the multiple key values of each record of the multiple records is associated with a different table of the multiple tables. 7. A computer program product, comprising a computer readable hardware storage device having a computer readable program code stored therein, said program code configured to be executed by a processor of a computer system to implement a method for sorting data of an input file stored on a first tangible storage device, said input file comprising multiple records associated with multiple tables of a database, each record of the multiple records comprising a plurality of key values, said method comprising: said processor segmenting the plurality of key values in the multiple records associated with each table into ranges of key values for each table, each range of key values for each table denoted as a segment having an associated segment value; said processor generating, for each record of the multiple records, a block number denoting a unique permutation of the segment values of the segments, said segment values respectively denoting the ranges of key values encompassing multiple key values selected for each record in association with the tables of the multiple tables; said processor ascertaining, for each record of the multiple records, a sort key value based on the generated block number for each record of the multiple records; said processor sorting the multiple records according to the sort key values after adding the sort key value to each record of the multiple records; and said processor storing the sorted multiple records in an output file on a second tangible storage device; wherein the generated block numbers collectively constitute multiple block numbers, wherein the method further comprises sequencing the block numbers of the multiple block numbers in a block sequence such that the segment value differs in only one position within the unique permutation of the segment values in each pair of successive blocks in the block sequence, and wherein said ascertaining the sort key value for each record of the multiple records comprises: converting the generated block number for each record of the multiple records to an ordinal value denoting a sequential position of the generated block number within the block sequence; determining an intra-block key position, within the unique permutation of the segment values

Assignees

Inventors

Classifications

  • Indexing; Data structures therefor; Storage structures (for retrieval from the web G06F16/951) · CPC title

  • Management thereof · CPC title

  • Unary operations; Data partitioning operations · CPC title

  • G06F7/08Primary

    Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9213782B2 cover?
A method and system for sorting data of an input file containing multiple records associated with multiple tables of a database. The multiple records include key values. The key values are segmented into ranges of key values for each table. Each range of key values for each table is a segment having a segment value. Multiple key values are selected for the multiple records. A block number, whic…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F7/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 15 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).