System and method of offline annotation of future accesses for improving performance of backup storage system
US-9189408-B1 · Nov 17, 2015 · US
US10114878B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10114878-B2 |
| Application number | US-201314108067-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2013 |
| Priority date | Dec 16, 2013 |
| Publication date | Oct 30, 2018 |
| Grant date | Oct 30, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer manages methods for utilizing an index to manage access to data in a dataset stored in one or more file locations in an ETL tool by receiving a request to access a dataset associated with one or more file locations, wherein the dataset is stored in the one or more file locations. The computer queries an index for the one or more file locations associated with the dataset, wherein the dataset has another index for data in the dataset. The computer receives the one or more file locations associated with the dataset. The computer determines to cache the request to access the one or more file locations for the dataset until one or more thresholds are met, wherein the cached request is part of a total number of cached requests.
Opening claim text (preview).
What is claimed is: 1. A method for utilizing an index to manage access to data in a dataset stored in one or more file locations in an Extract Transform Load tool, the method comprising: receiving, by one or more processors, a request to access a first dataset of a plurality of datasets stored at a source system during an Extract Transform Load process between the source system and an end target system during a period of high I/O requests, wherein the period of high I/O requests is based on a number of I/O requests received for a particular duration; querying, by one or more processors, a first index for one or more file locations where the first dataset is stored at the source system; responsive to determining that caching the request does not disrupt an order assigned to the request to access the first dataset of the plurality of datasets stored at the source system, caching, by one or more processors, the request to access the first dataset stored at the source system, wherein the cached access request for the first dataset is one of a plurality of cached requests to access the plurality of datasets stored in a plurality of file locations at the source system; responsive to determining a total size of the plurality of cached requests in temporary storage does not met a first threshold level, determining, by one or more processors, whether a duration for which the cached request for the first dataset of the plurality of cached requests has been cached in the temporary storage has met a second threshold level; responsive to determining that the duration for which the cached request for the first dataset of the plurality of cached requests has been cached has met the second threshold, identifying, by one or more processors, a first file location of the plurality of file locations to access in order to satisfy a portion of the plurality of cached requests for datasets that includes the cached request to access the first dataset; and accessing, by one or more processors, the first file location at the source system to satisfy the portion of the plurality of cached requests for datasets that includes the cached request to access the first dataset stored at the first file location, wherein accessing the first file location to satisfy the portion of the plurality of cached requests for the datasets stored at the first file location reduces the total size of the plurality of cached requests to access the source system during the Extract Transform Load process. 2. The method of claim 1 , further comprising: prior to receiving a request to access a first dataset, receiving, by one or more processors, data which is to be stored in the first dataset, wherein the data includes employee date of birth, employee salary amount, and employee expertise level; creating, by one or processors, a second index using one or more keys representing the data present in the received first dataset, wherein the one or more keys includes employee names; and identifying, by one or more processors, each field of the second index to store one or more entries, wherein each entry is associated with one or more file locations along with an offset of the data to be stored in the first dataset in the one or more file locations. 3. A computer program product for utilizing an index to manage access to data in a dataset stored in one or more file locations in an Extract Transform Load tool, the computer program product comprising: one or more computer readable storage media; program instructions stored on the one or more computer readable storage media, which when executed by one or more processors, to: receive a request to access a first dataset of a plurality of datasets stored at a source system during an Extract Transform Load process between the source system and an end target system during a period of high I/O requests, wherein the period of high I/O requests is based on a number of I/O requests received for a particular duration; query a first index for one or more file locations where the first dataset is stored at the source system; responsive to determining that caching the request does not disrupt an order assigned to the request to access the first dataset of the plurality of datasets stored at the source system, cache the request to access the first dataset stored at the source system, wherein the cached access request for the first dataset is one of a plurality of cached requests to access the plurality of datasets stored in a plurality of file locations at the source system; responsive to determining a total size of the plurality of cached requests in temporary storage does not met a first threshold level, determine whether a duration for which the cached request for the first dataset of the plurality of cached requests has been cached in the temporary storage has met a second threshold level; responsive to determining the duration for which the cached request for the first dataset of the plurality of cached requests has been cached has met the second threshold, identify a first file location of the plurality of file locations to access in order to satisfy a portion of the plurality of cached requests for datasets that includes the cached request to access the first dataset; and access the first file location at the source system to satisfy the portion of the plurality of cached requests for datasets that includes the cached request to access the first data set stored at the first file location, wherein accessing the first file location to satisfy the portion of the plurality of cached requests for the datasets stored at the first file location reduces the total size of the plurality of cached requests to access the source system during the Extract Transform Load process. 4. The computer program product of claim 3 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, to: prior to receiving a request to access a first dataset, receive data which is to be stored in the first dataset, wherein the data includes employee date of birth, employee salary amount, and employee expertise level; create a second index using one or more keys representing the data present in the received first dataset, wherein the one or more keys includes employee names; and identify, by one or more processors, each field of the second index to store one or more entries, wherein each entry is associated with one or more file locations along with an offset of the data to be stored in the first dataset in the one or more file locations. 5. A computer system for utilizing an index to manage access to data in a dataset stored in one or more file locations in an ETL tool, the computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions stored on the one or more computer readable storage media, for execution by at least one of the one or more computer processors, which when executed, to: receive a request to access a first dataset of a plurality of datasets stored at a source system during an Extract Transform Load process between the source system and an end target system during a period of high I/O requests, wherein the period of high I/O requests is based on a number of I/O requests received for a particular duration; query a first index for one or more file locations where the first dataset is stored at the source system; responsive to determining that caching the request does not disrupt an order assigned to the request to access the first dataset of the plurality of datasets stored at the source system, cache the request to access the first dataset stored at the source system, wherein the cached access request for the first dataset is one of a plurality of cached requests to access the plurality of datasets
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.