Migrating workloads between a plurality of execution environments
US-10853148-B1 · Dec 1, 2020 · US
US12236264B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12236264-B2 |
| Application number | US-202117163386-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2021 |
| Priority date | Jan 30, 2021 |
| Publication date | Feb 25, 2025 |
| Grant date | Feb 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, devices, and techniques are disclosed for data shards for distributed processing. Data sets of data for users may be received. The data sets may belong to separate groups. User identifiers in the data sets may be hashed to generate hashed identifiers for the data sets. The user identifiers in the data sets may be replaced with the hashed identifiers. The data sets may be split to generate shards. The data sets may be split into the same number of shards. Merged shards may be generated by merging the shards using a separate running process for each of the merged shards. The merged shards may be generated using shards from more than one of the two or more data sets. An operation may be performed on all of the merged shards.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising: receiving two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups; hashing user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets; replacing the user identifiers in the two or more data sets with the hashed identifiers; splitting each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets; generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets; performing an operation on all of the merged shards wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set; and training a machine learning system using the merged data set. 2. The computer-implemented method of claim 1 , wherein performing an operation on each of the merged shards comprises performing non-negative matrix factorization on the merged shards. 3. The computer-implemented method of claim 1 , wherein equivalent shards of the two or more data sets comprise shards assigned data from separate data sets based on the same criteria. 4. The computer-implemented method of claim 3 , wherein the criteria comprises an alphanumeric range that a hashed identifier for the data falls into. 5. The computer-implemented method of claim 1 , wherein generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets comprises merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 6. The computer-implemented method of claim 5 , wherein merging a first set of equivalent shards from the shards on the first processor further comprises: joining the data in the equivalent shards; sorting the data in the equivalent shards by hashed identifier; and merging data for any duplicate hashed identifiers. 7. A computer-implemented system for localization of matrix factorization models trained with global data comprising: one or more storage devices; and two or more processors that receive two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups with a first of the two or processors, hash user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets with the first of the two or processors, replace the user identifiers in the two or more data sets with the hashed identifiers with the first of the two or processors, split each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards with the first of the two or processors by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets, generate merged shards by merging the shards using a separate running process on each of the two or more processors for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets, perform an operation on all of the merged shards with the first of the two or processors wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set and, training a machine learning system using the merged data set. 8. The computer-implemented system of claim 7 , wherein the first of the two or more processors performs an operation on each of the merged shards by performing non-negative matrix factorization on the merged shards. 9. The computer-implemented system of claim 7 , wherein equivalent shards of the two or more data sets comprise shards assigned data from separate data sets based on the same criteria. 10. The computer-implemented system of claim 9 , wherein the criteria comprises an alphanumeric range that a hashed identifier for the data falls into. 11. The computer-implemented system of claim 7 , wherein the two or more processors generate merged shards by merging the shards using a separate running process on each of the two or more processors for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets by merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 12. The computer-implemented system of claim 11 , wherein the first of the two or more processors merges a first set of equivalent shards from the shards on the first processor further by joining the data in the equivalent shards, sorting the data in the equivalent shards by hashed identifier, and merging data for any duplicate hashed identifiers. 13. A system comprising: one or more computers and one or more non-transitory storage devices storing instructions which are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups; hashing user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets; replacing the user identifiers in the two or more data sets with the hashed identifiers; splitting each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets; generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets; performing an operation on all of the merged shards wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set; and training a machine learning system using the merged data set. 14. The system of claim 13 , wherein the instructions that cause the one or more computers to perform operations comprising generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets further cause the one or more computers to perform operations comprising merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 15. The system of claim 14 , wherein the instructions that cause the one or more computers to perform operations co
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Machine learning · CPC title
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.