Load balancing for distributed processing of deterministically assigned data using statistical analysis of block data

US11074516B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11074516-B2
Application numberUS-201815881191-A
CountryUS
Kind codeB2
Filing dateJan 26, 2018
Priority dateJan 26, 2018
Publication dateJul 27, 2021
Grant dateJul 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Dynamic generation and implementation of assignment mappings of data items in large data files to distributed processors to achieve objectives such as reduced overall processing time like. Any appropriate key (e.g., character string) can be identified or obtained for each data item in a data file and the file can be segmented into sequential data blocks, where each data block includes a set of data items. The data items in each of a first plurality of the blocks (e.g., sampled block set) may be initially sorted into one of a plurality of key ranges of a search space (each corresponding to a different respective processor) and analyses conducted on the data items totals in each key range. The key range boundaries can be adjusted by accounting for uncertainty in the sample estimates to more evenly distribute data items from all blocks sent to each processor and thereby achieve the objective.

First claim

Opening claim text (preview).

We claim: 1. A method for use in managing loads among a plurality of parallel processors, including: receiving, at a processor executing code to provide a mapping engine, a data file that includes a plurality of data items, wherein the processor is communicatively linked via a communication network to the parallel processors; with the mapping engine, determining a unique key for each data item of the plurality of data items; with the mapping engine, sampling a plurality of first blocks of the data items, wherein each first block includes a different subset of the plurality of data items; with the mapping engine, sorting the unique keys in each first block into one of a plurality of key ranges in a sort space, wherein each key range includes first and second boundaries, and wherein each key range corresponds to a different respective one of the parallel processors; conducting, by the processor using the mapping engine, an analysis on the unique keys in each key range to determine, for each key range, a probability of its corresponding one of the parallel processors completing execution of all data items in the key range last among all of the parallel processors; with the mapping engine, adjusting one or more of the first and second boundaries of one or more of the key ranges such that the probabilities of the parallel processors approach equalization; and sending the data items in the data file associated with unique keys in each key range to the different respective ones of the parallel processors for execution after the adjusting based on a mapping generated by the mapping engine that indicates which different respective one of the parallel processors is to process each of the plurality of data items, wherein the parallel processors execute the respective data items in parallel. 2. The method of claim 1 , wherein the conducting includes: summing, for each key range, the unique keys in the key range among all of the first blocks to obtain a sum of unique keys in each key range among all of the first blocks; and analyzing the sums. 3. The method of claim 2 , further including: determining, for each key range, a mean number of unique keys per first block in the key range; determining, for each key range, a standard deviation of the unique keys per first block in the key range; and manipulating the mean and standard deviation for each key range, wherein the adjusting is based on the manipulating. 4. The method of claim 3 , wherein the manipulating includes: obtaining, for each different respective parallel processor, a measure of uncertainty in the mean number of unique keys per first block (“standard error”) in its corresponding key range based on a) the standard deviation of the unique keys per first block in the corresponding key range and b) a total number of the plurality of first blocks, wherein the probability for each parallel processor is based on a) the mean number of unique keys per first block in its corresponding key range and b) the “standard error” in its corresponding key range. 5. The method of claim 4 , wherein the adjusting includes adjusting the one or more of the first and second boundaries of one or more of the key ranges such that for any of the key ranges, a sum of: a) the mean number of unique keys per first block for the key range, and b) a product of i) the standard error for the key range and ii) a constant, is equal to the same value. 6. The method of claim 1 , wherein the adjusting includes adjusting the one or more of the first and second boundaries of the one or more of the key ranges so as to encompass fewer data items in the data file when the probability of the one or more parallel processors corresponding to the one or more of the key ranges is greater than the inverse of a total number of the parallel processors, and wherein the adjusting includes adjusting the one or more of the first and second boundaries of the one or more of the key ranges so as to encompass more data items in the data file when the probability of the one or more parallel processors corresponding to the one or more of the key ranges is less than the inverse of a total number of the parallel processors. 7. The method of claim 1 , wherein the conducting includes determining that each key range has a different total number of unique keys compared to the other key ranges and wherein the adjusting includes adjusting the one or more of the first and second boundaries of the one or more of the key ranges such that that the total number of unique keys in each key ranges approaches equalization. 8. The method of claim 1 , further including: receiving a request for the one of the plurality of parallel processors that processed a particular one of the plurality of data items identified by a particular unique key; and using the particular unique key as an index into the mapping to identify the particular parallel processor. 9. The method of claim 1 , wherein each unique key is an alphanumeric string, and wherein each of the first and second boundaries of each of the key ranges is an alphanumeric string. 10. A method of implementing an assignment plan to map each of a plurality of data items in a data file to one of a plurality of parallel processors, comprising: with a mapping engine running on a computer system communicatively coupled to the parallel processors, sorting unique keys of data items in each of a plurality of first blocks of the data file into one of a plurality of key ranges in a sort space, wherein each key range corresponds to a different respective one of the parallel processors; with the mapping engine, determining, for each key range, a mean number of unique keys per first block in the key range; with the mapping engine, determining, for each key range, a standard deviation of the unique keys per first block in the key range; with the mapping engine, obtaining, for each different respective parallel processor, a measure of uncertainty in the mean number of unique keys per first block (“standard error”) based on a) the standard deviation of the unique keys per first block in the corresponding key range and b) a total number of the plurality of first blocks; generating, by the mapping engine based on the a) mean number of unique keys per first block and b) the standard error for each different respective parallel processor, a mapping that indicates which different respective one of the plurality of parallel processors is to process each of the plurality of data items; and sending the data items in the data file associated with unique keys in each key range to the different respective ones of the parallel processors for execution based on the generated mapping, wherein the parallel processors execute the respective data items in parallel. 11. The method of claim 10 , further including: using, for each different respective parallel processor, a) the mean number of unique keys per first block and b) the standard error for each different respective parallel processor, to determine a probability of the parallel processor completing execution of the data items in its respective key range last among all of the parallel processors; adjusting one or more of the first and second boundaries of one or more of the key ranges such that the probabilities of the parallel processors approach equalization upon: re-sorting the unique keys of data items in each of the plurality of first blocks of the data file into one of a plurality of key ranges the sort space after the adjusting; re-determining, for each key range, the mean number of unique keys per first block in the key range; and re-determining, for each key range, the standard deviation of the unique keys per first block in the key range.

Assignees

Inventors

Classifications

  • G06N7/01Primary

    Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Details of de-fragmentation performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title

  • Techniques for rebalancing the load in a distributed system · CPC title

  • using fuzzy logic (computing arrangements based on biological models G06N3/00; computing arrangements using knowledge-based models G06N5/00) · CPC title

  • Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11074516B2 cover?
Dynamic generation and implementation of assignment mappings of data items in large data files to distributed processors to achieve objectives such as reduced overall processing time like. Any appropriate key (e.g., character string) can be identified or obtained for each data item in a data file and the file can be segmented into sequential data blocks, where each data block includes a set of …
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06N7/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).