Enforcing data security constraints in a data pipeline

US12079352B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12079352-B2
Application numberUS-202117226014-A
CountryUS
Kind codeB2
Filing dateApr 8, 2021
Priority dateDec 18, 2020
Publication dateSep 3, 2024
Grant dateSep 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method enforces data security constraints in a data pipeline. The data pipeline takes one or more source datasets as input and performs one or more data transformations on them. The method includes using data defining one or more data security constraints to configure the data pipeline to perform a data transformation on a restricted subset of entries of the source datasets. The restriction is defined by the data defining one or more data security constraints. The method further includes performing the data transformation according to the configuration to produce one or more transformed datasets. The method further includes using the data defining one or more data security constraints to perform a verification on one or more of the transformed datasets to ensure that entries in the one or more of the transformed datasets are restricted as defined by the one or more data security constraints.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for enforcing data security constraints in a data pipeline, wherein the data pipeline takes one or more source datasets as input and performs one or more data transformations on them, the method comprising: within a first stage of the data pipeline, generate a first transformed dataset by performing a first data transformation on a first subset of entries of the one or more source datasets, wherein the first subset is defined according to one or more first data security constraints, wherein the one or more first data security constraints are associated with one or more columns or rows, and wherein an entry is accepted into or rejected from the first transformed dataset based on a comparison between the entry and the one or more first data security constraints; within a second stage of the data pipeline, generate a second transformed dataset by performing a second data transformation on a second subset of entries of the one or more source datasets; validate the second transformed dataset according to a pattern or constraint specified by the first transformed dataset, wherein the validating comprises comparing entries of the second transformed dataset against the first transformed dataset and filtering out any fields of the second transformed dataset that fail to conform to the pattern or constraint specified by the first transformed dataset, the first transformed dataset specifying a previously unknown or undefined criteria; and providing an alert if any fields of the second transformed dataset fail to conform to the pattern. 2. The method of claim 1 , further comprising, prior to the first stage: obtaining, from a user, data defining the first data security constraints to be applied to the first data transformation. 3. The method of claim 1 , wherein the one or more first data security constraints defines one or more conditions based on which the entry or a different entry in the one or more source datasets is either accepted or rejected for inclusion in the first subset of entries according to the one or more first data security constraints. 4. The method of claim 3 , wherein the one or more first data security constraints defines one or more acceptable values for entries of a certain type, and wherein the entry is accepted or rejected based on whether the entry matches the one or more acceptable values. 5. The method of claim 1 , wherein the first data transformation is a pre-existent data transformation of the data pipeline. 6. The method of claim 1 , wherein the second subset of entries of the one or more source datasets is defined according to one or more second data security constraints; and the validating of the second transformed dataset is based on the one or more second data security constraints. 7. The method of claim 6 , further comprising validating the first transformed dataset; and in response to the first transformed dataset being successfully validated, refraining from using the first data security constraints to perform the validation on the second transformed dataset. 8. The method of claim 7 , further comprising: communicating the second transformed dataset to an external entity if the validation of the second transformed dataset is successful. 9. The method of claim 1 , wherein the second transformed dataset is released if the validation of the second transformed dataset is successful. 10. The method of claim 1 , further comprising: preventing communication of the second transformed dataset if the validation fails. 11. The method of claim 1 , further comprising: communicating the second transformed dataset to an external entity if the validation succeeds. 12. The method of claim 1 , wherein the second subset of entries reference at least a portion of the first subset of entries as a foreign key. 13. The method of claim 1 , further comprising receiving an update to the first transformed dataset; and validating the second transformed dataset or a third transformed dataset according to a second pattern of the updated first transformed dataset. 14. A data processing system configured to enforce data security constraints in a data pipeline, wherein the data pipeline takes one or more source datasets as input and performs one or more data transformations on them, the data processing system including one or more processors and instructions that, when executed by the one or more processors, cause the data processing system to perform: within a first stage of the data pipeline, generate a first transformed dataset by performing a first data transformation on a first subset of entries of the one or more source datasets, wherein the first subset is defined according to one or more first data security constraints, wherein the one or more first data security constraints are associated with one or more columns or rows, and wherein an entry is accepted into or rejected from the first transformed dataset based on a comparison between the entry and the one or more first data security constraints; within a second stage of the data pipeline, generate a second transformed dataset by performing a second data transformation on a second subset of entries of the one or more source datasets; validate the second transformed dataset according to a pattern or constraint specified by the first transformed dataset, wherein the validating comprises comparing entries of the second transformed dataset against the first transformed dataset and filtering out any fields of the second transformed dataset that fail to conform to the pattern or constraint specified by the first transformed dataset, the first transformed dataset specifying a previously unknown or undefined criteria; and providing an alert if any fields of the second transformed dataset fail to conform to the pattern. 15. The data processing system of claim 14 , wherein the instructions further cause the data processing system to perform: prior to the first stage: obtaining, from a user, data defining the first data security constraints to be applied to the first data transformation. 16. The data processing system of claim 14 , wherein the one or more first data security constraints defines one or more conditions based on which the entry or a different entry in the one or more source datasets is either accepted or rejected for inclusion in the first subset of entries according to the one or more first data security constraints. 17. The data processing system of claim 16 , wherein the one or more first data security constraints defines one or more acceptable values for entries of a certain type, and wherein the entry is accepted or rejected based on whether the entry matches the one or more acceptable values. 18. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors to perform: within a first stage of the data pipeline, generate a first transformed dataset by performing a first data transformation on a first subset of entries of the one or more source datasets, wherein the first subset is defined according to one or more first data security constraints, wherein the one or more first data security constraints are associated with one or more columns or rows, and wherein an entry is accepted into or rejected from the first transformed dataset based on a comparison between the entry and the one or more first data security constraints; within a second stage of the data pipeline, generate a second transformed dataset by performing a second data transformation on a second subset of entries of t

Assignees

Inventors

Classifications

  • Data format conversion from or to a database · CPC title

  • where protection concerns the structure of data, e.g. records, types, queries · CPC title

  • G06F21/604Primary

    Tools and structures for managing or administering access control systems · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12079352B2 cover?
A computer-implemented method enforces data security constraints in a data pipeline. The data pipeline takes one or more source datasets as input and performs one or more data transformations on them. The method includes using data defining one or more data security constraints to configure the data pipeline to perform a data transformation on a restricted subset of entries of the source datase…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/604. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).