Who is the assignee on this patent?

Salesforce Com Inc, Salesforce Inc

What technology area does this patent fall under?

Primary CPC classification G06F9/4881. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data shards for distributed processing

US12236264B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12236264-B2
Application number	US-202117163386-A
Country	US
Kind code	B2
Filing date	Jan 30, 2021
Priority date	Jan 30, 2021
Publication date	Feb 25, 2025
Grant date	Feb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, devices, and techniques are disclosed for data shards for distributed processing. Data sets of data for users may be received. The data sets may belong to separate groups. User identifiers in the data sets may be hashed to generate hashed identifiers for the data sets. The user identifiers in the data sets may be replaced with the hashed identifiers. The data sets may be split to generate shards. The data sets may be split into the same number of shards. Merged shards may be generated by merging the shards using a separate running process for each of the merged shards. The merged shards may be generated using shards from more than one of the two or more data sets. An operation may be performed on all of the merged shards.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: receiving two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups; hashing user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets; replacing the user identifiers in the two or more data sets with the hashed identifiers; splitting each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets; generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets; performing an operation on all of the merged shards wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set; and training a machine learning system using the merged data set. 2. The computer-implemented method of claim 1 , wherein performing an operation on each of the merged shards comprises performing non-negative matrix factorization on the merged shards. 3. The computer-implemented method of claim 1 , wherein equivalent shards of the two or more data sets comprise shards assigned data from separate data sets based on the same criteria. 4. The computer-implemented method of claim 3 , wherein the criteria comprises an alphanumeric range that a hashed identifier for the data falls into. 5. The computer-implemented method of claim 1 , wherein generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets comprises merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 6. The computer-implemented method of claim 5 , wherein merging a first set of equivalent shards from the shards on the first processor further comprises: joining the data in the equivalent shards; sorting the data in the equivalent shards by hashed identifier; and merging data for any duplicate hashed identifiers. 7. A computer-implemented system for localization of matrix factorization models trained with global data comprising: one or more storage devices; and two or more processors that receive two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups with a first of the two or processors, hash user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets with the first of the two or processors, replace the user identifiers in the two or more data sets with the hashed identifiers with the first of the two or processors, split each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards with the first of the two or processors by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets, generate merged shards by merging the shards using a separate running process on each of the two or more processors for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets, perform an operation on all of the merged shards with the first of the two or processors wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set and, training a machine learning system using the merged data set. 8. The computer-implemented system of claim 7 , wherein the first of the two or more processors performs an operation on each of the merged shards by performing non-negative matrix factorization on the merged shards. 9. The computer-implemented system of claim 7 , wherein equivalent shards of the two or more data sets comprise shards assigned data from separate data sets based on the same criteria. 10. The computer-implemented system of claim 9 , wherein the criteria comprises an alphanumeric range that a hashed identifier for the data falls into. 11. The computer-implemented system of claim 7 , wherein the two or more processors generate merged shards by merging the shards using a separate running process on each of the two or more processors for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets by merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 12. The computer-implemented system of claim 11 , wherein the first of the two or more processors merges a first set of equivalent shards from the shards on the first processor further by joining the data in the equivalent shards, sorting the data in the equivalent shards by hashed identifier, and merging data for any duplicate hashed identifiers. 13. A system comprising: one or more computers and one or more non-transitory storage devices storing instructions which are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving two or more data sets of data for users wherein each of the two or more data sets belongs to a separate one of two or more groups; hashing user identifiers in the two or more data sets to generate hashed identifiers for the two or more data sets; replacing the user identifiers in the two or more data sets with the hashed identifiers; splitting each of the two or more data sets to generate shards, wherein each of the two or more data sets is split into the same number of shards by splitting each of the two more data sets such that the hashed identifiers that are common to two or more of the two or more data sets are in equivalent shards from the two or more of the two or more data sets; generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets; performing an operation on all of the merged shards wherein performing an operation on all of the merged shards comprises joining the merged shards into a merged data set; and training a machine learning system using the merged data set. 14. The system of claim 13 , wherein the instructions that cause the one or more computers to perform operations comprising generating merged shards by merging the shards using a separate running process for each of the merged shards, wherein each of the merged shards is generated using shards from more than one of the two or more data sets further cause the one or more computers to perform operations comprising merging a first set of equivalent shards from the shards on a first processor and merging a second set of equivalent shards from the shards on a second processor in parallel. 15. The system of claim 14 , wherein the instructions that cause the one or more computers to perform operations co

Assignees

Inventors

Classifications

G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06F9/3851
from multiple instruction streams, e.g. multistreaming · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N20/00
Machine learning · CPC title
G06F9/3877
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title

Patent family

Related publications grouped by family.

View patent family 82611419

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12236264B2 cover?: Systems, devices, and techniques are disclosed for data shards for distributed processing. Data sets of data for users may be received. The data sets may belong to separate groups. User identifiers in the data sets may be hashed to generate hashed identifiers for the data sets. The user identifiers in the data sets may be replaced with the hashed identifiers. The data sets may be split to gener…
Who is the assignee on this patent?: Salesforce Com Inc, Salesforce Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).