Data processing method and related apparatus
US-2024152807-A1 · May 9, 2024 · US
US9384238B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9384238-B2 |
| Application number | US-201313871847-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 26, 2013 |
| Priority date | Apr 26, 2013 |
| Publication date | Jul 5, 2016 |
| Grant date | Jul 5, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and count the number of records thereof corresponding to each block of a plurality of blocks. The computer system may also identify a plurality of partitions corresponding to selected blocks of the plurality of blocks. Each partition of the plurality of partitions may be substantially uniform in processing time. The computer system may then distribute a workload associated with a block or partition to each node of a plurality of nodes contained within the computer system. Each node may then process the block or partition in parallel such that each node completes the processing within a selected period of time.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for efficiently processing a large number of records, the method comprising: obtaining, by a computer system, a plurality of records; generating a plurality of tuples, each tuple of the plurality of tuples identifying a block key of a plurality of block keys and a corresponding record of the plurality of records; grouping portions of the plurality of records into a plurality of blocks according to the plurality of block keys; counting, by the computer system, a number of records of the plurality of records corresponding to each block of the plurality of blocks; identifying, by the computer system, a plurality of partitions corresponding to selected blocks of the plurality of blocks, each partition of the plurality of partitions being substantially uniform in processing time such that each partition of the plurality of partitions completes processing within a selected period of time, the identifying comprises: identifying, by the computer system, the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain a disproportionally large number of records of the plurality of records, such that the disproportionally large number of records of the plurality of records is determined by calculating a maximum size of an unpartitioned block of the plurality of blocks by: determining a comparison time to compare two records of the plurality of records; and determining a threshold time allowed for a processor of the computer system to complete the processing; distributing, by the computer system, a workload associated with a block of the plurality of blocks or a partition of the plurality of partitions to each node of a plurality of nodes contained within the computer system; and processing, in parallel by each node of the plurality of nodes, the workload associated with the block of the plurality of blocks or the partition of the plurality of partitions. 2. The method of claim 1 , wherein the processing comprises comparing records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions. 3. The method of claim 1 , wherein the processing further comprises searching for links between records corresponding to the block of the plurality of blocks or the of the plurality of partitions. 4. The method of claim 1 , wherein each record of the plurality of records comprises a customer profile. 5. The method of claim 1 , wherein the identifying the plurality of partitions corresponding to the selected blocks of the plurality of blocks further comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records. 6. The method of claim 1 , wherein the plurality of nodes comprises a cluster connected via a local area network. 7. The method of claim 6 , wherein the cluster comprises at least one hundred nodes. 8. The method of claim 7 , wherein the plurality of records comprises at least five hundred million records. 9. The method of claim 1 , wherein: each record of the plurality of records comprises a customer profile; and the plurality of records comprises at least five hundred million records. 10. The method of claim 1 , wherein: the processing comprises comparing records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; the processing further comprises searching for links between records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; each record of the plurality of records comprises a customer profile; the plurality of nodes comprises a cluster connected via a local area network; the cluster comprises at least one hundred nodes; and the plurality of records comprises at least five hundred million records. 11. A computer-implemented method for efficiently comparing a large number of records, the method comprising: obtaining, by a computer system, a plurality of records; generating a plurality of tuples, each tuple of the plurality of tuples identifying a block key of a plurality of block keys and a corresponding record of the plurality of records; grouping portions of the plurality of records into a plurality of blocks according to the plurality of block keys; counting, by the computer system, a number of records of the plurality of records corresponding to each block of the plurality of blocks; identifying, by the computer system, a plurality of partitions corresponding to selected blocks of the plurality of blocks, the selected blocks of the plurality of blocks contain a disproportionally large number of records of the plurality of records, such that the disproportionally large number of records of the plurality of records is determined by calculating a maximum size of an unpartitioned block of the plurality of blocks by: determining a comparison time to compare two records of the plurality of records; and determining a threshold time allowed for a processor of the computer system to complete processing; distributing, by the computer system, a workload associated with a block of the plurality of blocks or a partition of the plurality of partitions to each node of a plurality of nodes contained within the computer system; comparing, in parallel by each node of the plurality of nodes, records corresponding to the block of the plurality of blocks or the partition of the plurality of partitions; and linking, by the computer system in view of the comparing, one or more selected records of the plurality of records to one or more other records of the plurality of records. 12. The method of claim 11 , wherein each record of the plurality of records comprises a customer profile. 13. The method of claim 11 , wherein the identifying comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records. 14. The method of claim 11 , wherein the plurality of nodes comprises a cluster connected via a local area network. 15. The method of claim 14 , wherein the cluster comprises at least one hundred nodes. 16. The method of claim 15 , wherein the plurality of records comprises at least five hundred million records. 17. The method of claim 11 , wherein each record of the plurality of records corresponds to a customer. 18. The method of claim 17 , wherein the linking comprises linking one or more of the selected records of the plurality of records corresponding to a particular customer or household to one or more other records of the plurality of records that also correspond to the particular customer or household. 19. The method of claim 11 , wherein: each record of the plurality of records comprises a customer profile; the identifying comprises identifying, by the computer system, partitions of the plurality of partitions corresponding to the selected blocks of the plurality of blocks that contain over a threshold number of records of the plurality of records; the plurality of nodes comprises a cluster connected via a local area network; the cluster comprises at least one hundred nodes; the plurality of records comprises at least five hundred million records; each record of the plurality of records corresponds to a customer; and the linking comprises linking one or more
Query optimisation · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.