Distributed data processing framework

US11210271B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11210271-B1
Application numberUS-202016998909-A
CountryUS
Kind codeB1
Filing dateAug 20, 2020
Priority dateAug 20, 2020
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one aspect, there is provided a system. The system may store instructions that result in operations when executed by the at least one data processor. The operations may include receiving raw transactional data, collating, and reading the raw transactional data from the plurality of data sources. The operations may further include randomly sampling the raw transactional data. The operations may further include transforming the raw transactional data into at least one resilient distributed dataset. The operations may further include mapping the at least one resilient distributed dataset with a corresponding unique key. The operations may further include aggregating the at least one resilient distributed dataset on a key field. The operations may further include iterating over a lookup table. The operations may further include aggregating the data lines corresponding to the unique key associated with the at least one resilient distributed dataset. The operations may further include appending in-memory data lines serially to form a consumer level data string.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one data processor; and at least one memory storing instructions which, when executed by the at least one data processor, result in operations comprising: receiving, from a plurality of data sources, raw transactional data; collating and reading the raw transactional data from the plurality of data sources; randomly sampling, responsive to collating and reading, the raw transactional data; performing, responsive to the randomly sampling, a check of consistency and integrity of the raw transactional data; transforming the raw transactional data into at least one resilient distributed dataset; mapping the at least one resilient distributed dataset with a corresponding unique key; aggregating the at least one resilient distributed dataset on a key field; receiving a request for an entity; iterating over a lookup table, the lookup table including consumer profile information mapped to keys of different tables; aggregating, responsive to the iterating, data lines corresponding to the unique key associated with the at least one resilient distributed dataset; and appending in-memory data lines serially to form a consumer level data string. 2. The system of claim 1 , wherein the raw transactional data satisfies a size threshold. 3. The system of claim 1 , wherein the receiving occurs at a frequency, the frequency including one of daily, weekly, and monthly. 4. The system of claim 1 , wherein performing the check comprises sending an error message if the check fails. 5. The system of claim 1 , wherein the resilient distributed datasets are stored in a file system. 6. The system of claim 5 , wherein the resilient distributed datasets are stored across multiple memory partitions in the file system. 7. The system of claim 1 , wherein the transforming comprises: performing a consistency check on the raw transactional data; cleaning, responsive to the consistency check, data fields of the raw transactional data to remove unwanted and bad data fields; aggregating, responsive to the cleaning, the raw transactional data; grouping, responsive to the aggregating, the aggregated raw transactional data on one or more levels so that transactional activity of the aggregated raw transactional data over a time period is represented by a single data line; and outputting, responsive to the grouping, the single data line. 8. The system of claim 1 , wherein the transforming is performed by a plurality of worker nodes. 9. The system of claim 8 , wherein the transforming is allocated among the plurality of worker nodes by a driver. 10. The system of claim 1 , wherein the receiving, the collating, and/or the reading is performed by a program driver. 11. A method comprising: receiving, by at least one processor from a plurality of data sources, raw transactional data; collating and reading, by the at least one processor, the raw transactional data from the plurality of data sources; randomly sampling, by the at least one processor and responsive to collating and reading, the raw transactional data; performing, by the at least one processor and responsive to the randomly sampling, a check of consistency and integrity of the raw transactional data; transforming, by the at least one processor, the raw transactional data into at least one resilient distributed dataset; mapping, by the at least one processor, the at least one resilient distributed dataset with a corresponding unique key; aggregating, by the at least one processor, the at least one resilient distributed dataset on a key field; receiving, by the at least one processor, a request for an entity; iterating, by the at least one processor, over a lookup table, the lookup table including consumer profile information mapped to keys of different tables; aggregating, by the at least one processor and responsive to the iterating, data lines corresponding to the unique key associated with the at least one resilient distributed dataset; and appending in-memory data lines serially to form a consumer level data string. 12. The method of claim 11 , wherein the raw transactional data satisfies a size threshold. 13. The method of claim 11 , wherein the receiving occurs at a frequency, the frequency including one of daily, weekly, and monthly. 14. The method of claim 11 , wherein performing the check comprises sending an error message if the check fails. 15. The method of claim 11 , wherein the resilient distributed datasets are stored in a file system. 16. The method of claim 15 , wherein the resilient distributed datasets are stored across multiple memory partitions in the file system. 17. The method of claim 11 , wherein the transforming comprises: performing a consistency check on the raw transactional data; cleaning, responsive to the consistency check, data fields of the raw transactional data to remove unwanted and bad data fields; aggregating, responsive to the cleaning, the raw transactional data; grouping, responsive to the aggregating, the aggregated raw transactional data on one or more levels so that transactional activity of the aggregated raw transactional data over a time period is represented by a single data line; and outputting, responsive to the grouping, the single data line. 18. The method of claim 11 , wherein the transforming is performed by a plurality of worker nodes. 19. The method of claim 18 , wherein the transforming is allocated among the plurality of worker nodes by a driver. 20. A non-transitory computer readable medium storing instructions which, when executed by at least one processor, cause operations comprising: receiving, from a plurality of data sources, raw transactional data; collating and reading the raw transactional data from the plurality of data sources; randomly sampling, responsive to collating and reading, the raw transactional data; performing, responsive to the randomly sampling, a check of consistency and integrity of the raw transactional data; transforming the raw transactional data into at least one resilient distributed dataset; mapping the at least one resilient distributed dataset with a corresponding unique key; aggregating the at least one resilient distributed dataset on a key field; receiving a request for an entity; iterating over a lookup table, the lookup table including consumer profile information mapped to keys of different tables; aggregating, responsive to the iterating, data lines corresponding to the unique key associated with the at least one resilient distributed dataset; and appending in-memory data lines serially to form a consumer level data string.

Assignees

Inventors

Classifications

  • Data format conversion from or to a database · CPC title

  • Append-only file systems, e.g. using logs or journals to store data · CPC title

  • Marketing; Price estimation or determination; Fundraising · CPC title

  • Transactional file systems · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210271B1 cover?
In one aspect, there is provided a system. The system may store instructions that result in operations when executed by the at least one data processor. The operations may include receiving raw transactional data, collating, and reading the raw transactional data from the plurality of data sources. The operations may further include randomly sampling the raw transactional data. The operations m…
Who is the assignee on this patent?
Fair Isaac Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/1805. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).