Methods and apparatuses for automated performance tuning of a data modeling platform
US-2019303462-A1 · Oct 3, 2019 · US
US11494336B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11494336-B2 |
| Application number | US-202017067492-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 9, 2020 |
| Priority date | Oct 11, 2019 |
| Publication date | Nov 8, 2022 |
| Grant date | Nov 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content. The multiple fragments of data that each correspond to different instances of duplicate content can be ingested. The multiple fragments of data can be de-duplicated to determine one or more corresponding object data source records (DSRs). The one or more object DSRs can be imported within a data platform system.
Opening claim text (preview).
The invention claimed is: 1. A system comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing the one or more object DSRs within a data platform system; determining that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR; determining whether a second fragment is associated with the access control information and the particular object DSR; and selectively creating a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 2. The system of claim 1 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with the ingested fragment. 3. The system of claim 1 , wherein de-duplicating the multiple fragments of data further causes the system to perform: de-duplicating the multiple fragments of data based on their respective hash values. 4. The system of claim 3 , wherein fragments having a same first hash value are associated with a first object DSR, and wherein fragments having a same second hash value are associated with a second object DSR. 5. The system of claim 1 , wherein de-duplicating the multiple fragments of data further causes the system to perform: de-duplicating the multiple fragments of data based on their respective hash values and other information associated with the fragments. 6. The system of claim 5 , wherein the fragments are de-duplicated based on their respective hash values and access control identifiers associated with the fragments. 7. The system of claim 6 , wherein fragments having a same first hash value and a first access control identifier are associated with a first object DSR, and wherein fragments having the same first hash value and a second access control identifier are associated with a second object DSR. 8. The system of claim 1 , wherein the single object DSR supports a property associated with an object managed by the data platform system. 9. The system of claim 1 , further comprising: determining a modification of a de-duplicated first fragment that has been imported into the data platform system as a first object DSR; and applying one or more rules for managing one or more data source records associated with the de-duplicated fragment and the first object DSR in the data platform system. 10. The system of claim 1 , further comprising: enforcing a set of invariants that manage relationships between de-duplicated fragments and corresponding object DSRs; and generating an error log entry when an invariant is breached. 11. A computer-implemented method, comprising: determining, by a computing system, multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting, by the computing system, the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating, by the computing system, the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing, by the computing system, the one or more object DSRs within a data platform system; determining, by the computing system, that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR; determining, by the computing system, whether a second fragment is associated with the access control information and the particular object DSR: and selectively creating, by the computing system, a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 12. The computer-implemented method of claim 11 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with the ingested fragment. 13. The computer-implemented method of claim 11 , wherein de-duplicating the multiple fragments of data further comprises: de-duplicating, by the computing system, the multiple fragments of data based on their respective hash values. 14. The computer-implemented method of claim 11 , wherein fragments having a same first hash value are associated with a first object DSR, and wherein fragments having a same second hash value are associated with a second object DSR. 15. The computer-implemented method of claim 11 , wherein de-duplicating the multiple fragments of data further comprises: de-duplicating, by the computing system, the multiple fragments of data based on their respective hash values and other information associated with the fragments. 16. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors of a computing system to perform: determining multiple fragments of data to be imported, the multiple fragments of data corresponding to different instances of data obtained from one or more external data sources, the different instances of data each corresponding to duplicate content, wherein at least a portion of the multiple fragments have different formats; ingesting the multiple fragments of data that each correspond to different instances of duplicate content; de-duplicating the multiple fragments of data to determine one or more corresponding object data source records (DSRs); importing the one or more object DSRs within a data platform system; determining that access control information associated with a first fragment of the multiple fragments of data has been modified into modified access control information, wherein the first fragment is associated with a particular object DSR: determining whether a second fragment is associated with the access control information and the particular object DSR; and selectively creating a new object DSR within the data platform system based on the modified access control information and based on the determination of whether the second fragment is associated with the access control information and the particular object DSR. 17. The non-transitory computer readable medium of claim 16 , wherein each ingested fragment is associated with a corresponding hash value, and wherein the hash value is determined based on content associated with t
using file content signatures, e.g. hash values · CPC title
Redundancy elimination performed by the file system (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.