Block partitioning for efficient record processing in parallel computing environment

US9384238B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9384238-B2
Application numberUS-201313871847-A
CountryUS
Kind codeB2
Filing dateApr 26, 2013
Priority dateApr 26, 2013
Publication dateJul 5, 2016
Grant dateJul 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and count the number of records thereof corresponding to each block of a plurality of blocks. The computer system may also identify a plurality of partitions corresponding to selected blocks of the plurality of blocks. Each partition of the plurality of partitions may be substantially uniform in processing time. The computer system may then distribute a workload associated with a block or partition to each node of a plurality of nodes contained within the computer system. Each node may then process the block or partition in parallel such that each node completes the processing within a selected period of time.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for efficiently processing a large number of records, the method comprising: obtaining, by a computer system, a plurality of records; generating a plurality of tuples, each tuple of the plurality of tuples identifying a block key of a plurality of block keys and a corresponding record of the plurality of records; grouping portions of the plurality of records into a plurality of blocks according to the plurality of block keys; counting, by the computer system, a number of records of the plurality of records corresponding to each block of the plurality of blocks; identifying, by the computer system, a plurality of partitions corresponding to selected blocks of the plurality of blocks, each partition of the plurality of partitions being substantially uniform in processing time such that each partition of the plurality of partitions completes processing within a selected period of time, the identifying comprises: identifying, by the computer system, the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain a disproportionally large number of records of the plurality of records, such that the disproportionally large number of records of the plurality of records is determined by calculating a maximum size of an unpartitioned block of the plurality of blocks by: determining a comparison time to compare two records of the plurality of records; and determining a threshold time allowed for a processor of the computer system to complete the processing; distributing, by the computer system, a workload associated with a block of the plurality of blocks or a partition of the plurality of partitions to each node of a plurality of nodes contained within the computer system; and processing, in parallel by each node of the plurality of nodes, the workload associated with the block of the plurality of blocks or the partition of the plurality of partitions. 2. The method of claim 1 , wherein the processing comprises comparing records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions. 3. The method of claim 1 , wherein the processing further comprises searching for links between records corresponding to the block of the plurality of blocks or the of the plurality of partitions. 4. The method of claim 1 , wherein each record of the plurality of records comprises a customer profile. 5. The method of claim 1 , wherein the identifying the plurality of partitions corresponding to the selected blocks of the plurality of blocks further comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records. 6. The method of claim 1 , wherein the plurality of nodes comprises a cluster connected via a local area network. 7. The method of claim 6 , wherein the cluster comprises at least one hundred nodes. 8. The method of claim 7 , wherein the plurality of records comprises at least five hundred million records. 9. The method of claim 1 , wherein: each record of the plurality of records comprises a customer profile; and the plurality of records comprises at least five hundred million records. 10. The method of claim 1 , wherein: the processing comprises comparing records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; the processing further comprises searching for links between records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; each record of the plurality of records comprises a customer profile; the plurality of nodes comprises a cluster connected via a local area network; the cluster comprises at least one hundred nodes; and the plurality of records comprises at least five hundred million records. 11. A computer-implemented method for efficiently comparing a large number of records, the method comprising: obtaining, by a computer system, a plurality of records; generating a plurality of tuples, each tuple of the plurality of tuples identifying a block key of a plurality of block keys and a corresponding record of the plurality of records; grouping portions of the plurality of records into a plurality of blocks according to the plurality of block keys; counting, by the computer system, a number of records of the plurality of records corresponding to each block of the plurality of blocks; identifying, by the computer system, a plurality of partitions corresponding to selected blocks of the plurality of blocks, the selected blocks of the plurality of blocks contain a disproportionally large number of records of the plurality of records, such that the disproportionally large number of records of the plurality of records is determined by calculating a maximum size of an unpartitioned block of the plurality of blocks by: determining a comparison time to compare two records of the plurality of records; and determining a threshold time allowed for a processor of the computer system to complete processing; distributing, by the computer system, a workload associated with a block of the plurality of blocks or a partition of the plurality of partitions to each node of a plurality of nodes contained within the computer system; comparing, in parallel by each node of the plurality of nodes, records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; and linking, by the computer system in view of the comparing, one or more selected records of the plurality of records to one or more other records of the plurality of records. 12. The method of claim 11 , wherein each record of the plurality of records comprises a customer profile. 13. The method of claim 11 , wherein the identifying comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records. 14. The method of claim 11 , wherein the plurality of nodes comprises a cluster connected via a local area network. 15. The method of claim 14 , wherein the cluster comprises at least one hundred nodes. 16. The method of claim 15 , wherein the plurality of records comprises at least five hundred million records. 17. The method of claim 11 , wherein each record of the plurality of records corresponds to a customer. 18. The method of claim 17 , wherein the linking comprises linking one or more of the selected records of the plurality of records corresponding to a particular customer or household to one or more other records of the plurality of records that also correspond to the particular customer or household. 19. The method of claim 11 , wherein: each record of the plurality of records comprises a customer profile; the identifying comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records; the plurality of nodes comprises a cluster connected via a local area network; the cluster comprises at least one hundred nodes; the plurality of records comprises at least five hundred million records; each record of the plurality of records corresponds to a customer; and the linking comprises linking one or more

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9384238B2 cover?
A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and count the number of records thereof corresponding to each block of a plurality of blocks. The computer system may also identify a plurality of partitions corresponding to selected blocks of the plurality of blocks. Each partition…
Who is the assignee on this patent?
Wal Mart Stores Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2453. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).