Method and apparatus for smart archiving and analytics

US10360193B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10360193-B2
Application numberUS-201715468245-A
CountryUS
Kind codeB2
Filing dateMar 24, 2017
Priority dateMar 24, 2017
Publication dateJul 23, 2019
Grant dateJul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for archiving and analyzing data are disclosed. The system receives event data associated with a process; responsive to receiving the event data, determines process data associated with the process; generates process metadata from the event data and the process data; and stores the event data, the process data, and the process metadata in a data repository organized by the process metadata. Since the process data is determined early on in the data pipeline, the system can significantly reduce the amount of computation required for generating data analytics. The system is also capable of providing analytic results computed against a massive amount of archived data in real-time or near real-time as user requests are initiated. Efficiency of process mining and process optimization is also improved due to enhanced information stored for archived processes.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a data repository for storing data; a data ingestion module having an input coupled to receive event data associated with a process and an output coupled to the data repository, the data ingestion module operable to: receive the event data; and temporarily store the event data; and a process archiving module having an input coupled to the data ingestion module to receive the event data associated with the process and an output coupled to the data repository, the process archiving module operable to: receive the event data; determine process data associated with the process; generate process metadata from the event data and the process data; generate a first dataset entry for the process, the first dataset entry including the process metadata; retrieve an archived data model describing second dataset entries in the data repository; determine a relationship between the first dataset entry and the second dataset entries by applying the archived data model to the first dataset entry; and store the first dataset entry in the data repository based on the relationship between the first dataset entry and the second dataset entries. 2. The system of claim 1 , wherein the process archiving module is further operable to generate the process metadata by applying machine learning to the event data and the process data. 3. The system of claim 1 , wherein: the process metadata includes a process parameter describing the process; and the process archiving module is further operable to apply the archived data model to the first dataset entry by identifying a cluster of second dataset entries for the first dataset entry using a clustering algorithm, the clustering algorithm being used for clustering based on the process parameter. 4. The system of claim 1 , further comprising a real-time analysis module coupled to the data repository to retrieve dataset entries, the real-time analysis module operable to: generate a real-time view associated with one or more analytic outputs using the dataset entries; and store the real-time view in a process storage. 5. The system of claim 1 , further comprising a batch analysis module coupled to the data repository to retrieve dataset entries, the batch analysis module operable to: pre-compute a batch view associated with one or more analytic outputs using the dataset entries; and store the batch view in a process storage. 6. The system of claim 1 , further comprising an archived data managing module communicatively coupled to the data repository, the archived data managing module operable to: determine a consuming pattern associated with dataset entries in the data repository; determine a process parameter based on the consuming pattern; and cluster the dataset entries in the data repository based on the determined process parameter to generate the archived data model. 7. The system of claim 1 , wherein: the process archiving module is further operable to determine a value of a data attribute from the event data associated with the process and generate the first dataset entry, the first dataset entry including the determined value of the data attribute; and the system further comprises: an archived data managing module configured to organize the second dataset entries in the data repository based on the data attribute; an analytic module configured to store a pre-computed view associated with an analytic output in an analytic profile storage to generate a unified view; and a profile enabling module coupled to and controlling the process archiving module, the archived data managing module, and the analytic module, the profile enabling module configured to receive a selected analytic profile, and identify the data attribute and the analytic output associated with the selected analytic profile. 8. A method comprising: receiving event data associated with a process; responsive to receiving the event data, determining process data associated with the process; generating process metadata from the event data and the process data; generating a first dataset entry for the process, the first dataset entry including the process metadata; retrieving an archived data model describing second dataset entries in a data repository; determining a relationship between the first dataset entry and the second dataset entries by applying the archived data model to the first dataset entry; and storing the first dataset entry in the data repository based on the relationship between the first dataset entry and the second dataset entries. 9. The method of claim 8 , wherein the first dataset entry includes the event data, the process data, and the process metadata. 10. The method of claim 9 , wherein: the process metadata includes a process parameter describing the process; and determining the relationship between the first dataset entry and the second dataset entries by applying the archived data model to the first dataset entry includes identifying a cluster of second dataset entries for the first dataset entry using a clustering algorithm, the clustering algorithm being used for clustering based on the process parameter. 11. The method of claim 10 , wherein storing the first dataset entry in the data repository includes: determining a dataset associated with the cluster of second dataset entries; and updating the determined dataset to include the first dataset entry. 12. The method of claim 9 , further comprising: receiving a selection of an analytic profile, the analytic profile specifying an analytic output to be provided; and identifying a data attribute associated with the analytic profile; wherein generating the first dataset entry includes: determining a value of the data attribute from the event data associated with the process; and generating the first dataset entry for the process, the first dataset entry including the determined value of the data attribute. 13. The method of claim 12 , further comprising: organizing the second dataset entries in the data repository based on the data attribute; and storing a pre-computed view associated with the analytic output in an analytic profile storage to generate a unified view for the analytic output. 14. The method of claim 8 , further comprising: determining a consuming pattern associated with dataset entries in the data repository; determining a process parameter based on the consuming pattern; and clustering the dataset entries based on the determined process parameter to generate the archived data model. 15. The method of claim 14 , wherein the archived data model includes a first cluster of dataset entries and a second cluster of dataset entries, and the method further comprises: aggregating the first cluster of dataset entries into a first dataset; aggregating the second cluster of dataset entries into a second dataset; and storing the first dataset in a first folder and the second dataset in a second folder of the data repository, the first folder and the second folder being organized based on the archived data model. 16. A system comprising: means for receiving event data associated with a process; means for determining process data associated with the process; means for generating process metadata from the event data and the process data; means for generating a first dataset entry for the process, the first dataset entry including the process metadata; means for retrieving an archived data model describing second dataset entries in a data repository; means for determining a relationship between the first d

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • G06F16/219Primary

    Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title

  • using statistics or function optimisation, e.g. modelling of probability density functions · CPC title

  • Clustering or classification · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360193B2 cover?
A system and method for archiving and analyzing data are disclosed. The system receives event data associated with a process; responsive to receiving the event data, determines process data associated with the process; generates process metadata from the event data and the process data; and stores the event data, the process data, and the process metadata in a data repository organized by the p…
Who is the assignee on this patent?
Western Digital Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).