Approaches for managing object data

US11494336B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11494336-B2
Application numberUS-202017067492-A
CountryUS
Kind codeB2
Filing dateOct 9, 2020
Priority dateOct 11, 2019
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content. The multiple fragments of data that each correspond to different instances of duplicate content can be ingested. The multiple fragments of data can be de-duplicated to determine one or more corresponding object data source records (DSRs). The one or more object DSRs can be imported within a data platform system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing the one or more object DSRs within a data platform system; determining that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR; determining whether a second fragment is associated with the access control information and the particular object DSR; and selectively creating a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 2. The system of claim 1 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with the ingested fragment. 3. The system of claim 1 , wherein de-duplicating the multiple fragments of data further causes the system to perform: de-duplicating the multiple fragments of data based on their respective hash values. 4. The system of claim 3 , wherein fragments having a same first hash value are associated with a first object DSR, and wherein fragments having a same second hash value are associated with a second object DSR. 5. The system of claim 1 , wherein de-duplicating the multiple fragments of data further causes the system to perform: de-duplicating the multiple fragments of data based on their respective hash values and other information associated with the fragments. 6. The system of claim 5 , wherein the fragments are de-duplicated based on their respective hash values and access control identifiers associated with the fragments. 7. The system of claim 6 , wherein fragments having a same first hash value and a first access control identifier are associated with a first object DSR, and wherein fragments having the same first hash value and a second access control identifier are associated with a second object DSR. 8. The system of claim 1 , wherein the single object DSR supports a property associated with an object managed by the data platform system. 9. The system of claim 1 , further comprising: determining a modification of a de-duplicated first fragment that has been imported into the data platform system as a first object DSR; and applying one or more rules for managing one or more data source records associated with the de-duplicated fragment and the first object DSR in the data platform system. 10. The system of claim 1 , further comprising: enforcing a set of invariants that manage relationships between de-duplicated fragments and corresponding object DSRs; and generating an error log entry when an invariant is breached. 11. A computer-implemented method, comprising: determining, by a computing system, multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting, by the computing system, the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating, by the computing system, the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing, by the computing system, the one or more object DSRs within a data platform system; determining, by the computing system, that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR; determining, by the computing system, whether a second fragment is associated with the access control information and the particular object DSR: and selectively creating, by the computing system, a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 12. The computer-implemented method of claim 11 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with the ingested fragment. 13. The computer-implemented method of claim 11 , wherein de-duplicating the multiple fragments of data further comprises: de-duplicating, by the computing system, the multiple fragments of data based on their respective hash values. 14. The computer-implemented method of claim 11 , wherein fragments having a same first hash value are associated with a first object DSR, and wherein fragments having a same second hash value are associated with a second object DSR. 15. The computer-implemented method of claim 11 , wherein de-duplicating the multiple fragments of data further comprises: de-duplicating, by the computing system, the multiple fragments of data based on their respective hash values and other information associated with the fragments. 16. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors of a computing system to perform: determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing the one or more object DSRs within a data platform system; determining that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR: determining whether a second fragment is associated with the access control information and the particular object DSR; and selectively creating a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 17. The non-transitory computer readable medium of claim 16 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with t

Assignees

Inventors

Classifications

  • G06F16/152Primary

    using file content signatures, e.g. hash values · CPC title

  • Redundancy elimination performed by the file system (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11494336B2 cover?
Systems and methods are provided for determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content. The multiple fragments of data that each correspond to different instances of duplicate content can be inge…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/152. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).