Releasing data storage tracks while maintaining logical corruption protection
US-2022043573-A1 · Feb 10, 2022 · US
US12032585B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12032585-B2 |
| Application number | US-202016986757-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 6, 2020 |
| Priority date | Aug 6, 2020 |
| Publication date | Jul 9, 2024 |
| Grant date | Jul 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to using machine learning techniques to predict storage configurations for historical data. In some embodiments, a computer system stores representations of historical data according to a current set of storage parameters. The representations may include snapshots of historical data in a data repository at different points in time. The computer system may receive queries for historical data specifying points in time from which to retrieve the historical data. In some embodiments, the computer system responds to the queries using the stored representations and determines performance metrics for the responses. In some embodiments, the computer system trains a machine learning model using the performance metrics. Based on output of the trained model, the computer system updates the current set of storage parameters. The updating may affect subsequent storage of representations in the data repository, which may advantageously improve query response times and decrease repository storage size.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: storing, by a computer system according to a current set of storage parameters, representations of historical data, wherein the representations include snapshots of historical data in a data repository at different points in time: receiving, by the computer system, queries for historical data, wherein the queries specify points in time from which to retrieve the historical data: responding, by the computer system, to the queries using the stored representations, wherein the stored representations include prior snapshots and journals, and wherein the responding includes: reading, in parallel from a snapshot database and a journal database, at least one prior snapshot and one or more journals, wherein the reading in parallel is performed based on a known distribution of points in time of snapshots stored in the snapshot database: generating, based on the historical data included in the stored representations, one or more sets of unlabeled features: training, using the one or more sets of unlabeled features, different instances of a machine learning classifier: and determining, by the computer system, performance metrics for responses to the queries: training, by the computer system using the performance metrics, a machine learning model: and updating, by the computer system based on output of the trained machine learning model, the current set of storage parameters wherein the updating affects subsequent storage of representations in the data repository. 2. The method of claim 1 , wherein responding to the queries further includes: comparing output of the different trained instances of the machine learning classifier with a machine learning classifier trained using a set of labeled features, wherein results of the comparing provide performance indicators for the different trained instances. 3. The method of claim 1 , wherein the representations of historical data further include journals that indicate changes between ones of the snapshots. 4. The method of claim 3 , wherein the storing includes: joining two consecutive snapshots that include records of historical data, wherein the joining is performed based on primary keys of the two consecutive snapshots; determining, based on the joining, one or more changes between the joined snapshots; and generating, based on the determined one or more changes, one or more new journals. 5. The method of claim 4 , wherein the generating is performed based on a delta threshold included in the current set of storage parameters that specifies a number of changes to be included in respective journals. 6. The method of claim 4 , wherein the storing includes: generating one or more new snapshots based on the one or more new journals, wherein the generating is performed based on a journal threshold included in the current set of storage parameters. 7. The method of claim 6 , wherein the journal threshold specifies at least one of: a number of journals to be included in respective snapshots, a total size of one or more journals to be included in respective snapshots, a window of time from which to select one or more journals to be included in respective snapshots. 8. The method of claim 1 , wherein the responding that includes reading the at least one prior snapshot and one or more journals includes, for a particular query: reading, from a snapshot database, a prior snapshot that includes historical data at a first point in time that is prior to a particular point in time specified in the particular query, wherein the prior snapshot is a closest prior snapshot to the particular point in time; reading, from a journal database, one or more journals indicating changes between historical data included in the prior snapshot and historical data at the particular point in time; and generating, based on the prior snapshot and the one or more journals, at least one new snapshot at the particular point in time, wherein the generating includes replaying the one or more journals on top of the prior snapshot. 9. The method of claim 1 , wherein the performance metrics include an access metric specifying a frequency at which one or more of the snapshots are accessed for responses to the queries. 10. The method of claim 9 , wherein the updating the current set of storage parameters that is performed based on the access metric includes adjusting a storage location for one or more of the snapshots. 11. The method of claim 1 , wherein the current set of storage parameters specifies multiple different storage locations for the snapshots including at least a first storage location for snapshots associated with a high access frequency and a second storage location for snapshots associated with a low access frequency. 12. A non-transitory computer-readable medium having instructions stored thereon that are executable by a computer device to perform operations comprising: storing, according to a current set of storage parameters, snapshots and journals, wherein the snapshots include records of historical data in a data repository at different points in time, and wherein the journals indicate changes between ones of the snapshots: receiving a query for historical data, wherein the query specifies a particular point in time from which to retrieve the historical data: responding to the query using the stored snapshots and journals, wherein the responding includes: reading, in parallel from a snapshot database and a journal database, at least one prior snapshot and one or more journals, wherein the reading in parallel is performed based on a known distribution of points in time of snapshots stored in the snapshot database: generating, based on the historical data included in the at least one prior snapshot and the one or more journals, one or more sets of unlabeled features: transmitting, to a computing device from which the query was received, the one or more sets of unlabeled features, wherein the computing device is configured to train different instances of a machine learning classifier using the one or more sets of unlabeled features: generating, based on a closest prior snapshot to the particular point in time and one or more journals indicating changes between the historical data included in the closest prior snapshot and historical data at the particular point in time, at least one new snapshot at the particular point in time: and determining performance metrics for the response to the query: training, using the performance metrics, a machine learning model: and updating, based on output of the trained machine learning model, the current set of storage parameters. 13. The non-transitory computer-readable medium of claim 12 , wherein the performance metrics include a latency metric indicating a time interval between receiving queries and generating responses using stored snapshots and journals, and wherein the updating that is performed based on the latency metric includes adjusting a number of stored snapshots and a number of stored journals. 14. The non-transitory computer-readable medium of claim 13 , wherein the updating includes: adjusting, based on a predicted journal age output by the trained machine learning model, an age threshold included in the current set of storage parameters, wherein the computer device is configured to generate a snapshot from one or more journals based on the one or more journals being older than a particular point in time specified by the age threshold. 15. The non-transitory computer-readable medium of claim 12 , wherein the responding to a particular query that specifies a first point in time inc
Machine learning · CPC title
Performance evaluation by tracing or monitoring · CPC title
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
where the computing system component is a storage system, e.g. DASD based or network based (digital input from or digital output to record carriers G06F3/06; digital recording or reproducing G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.