Efficient data caching management in scalable multi-stage data processing systems

US10452612B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452612-B2
Application numberUS-201715423384-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2017
Priority dateSep 6, 2016
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some example embodiments, a method includes: receiving, by a processor, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; executing, by the processor, one or more of the operations of the processing profile to generate a new output data after each of the executed one or more operations; storing, by the processor, the new output data from at least one of the one or more operations as intermediate cache data; and transmitting, by the processor, the new output data from a final operation from among the one or more operations to the data source for display thereby.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a processor; and a memory coupled to the processor, wherein the memory stores instructions that, when executed by the processor, cause the processor to: receive, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; determine whether or not a block of stored intermediate cache data corresponds to an operation from among of the plurality of operations; in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, generate a simplified processing profile based on the block of stored intermediate cache data, the simplified processing profile comprising a subset of the plurality of operations of the processing profile without the operation corresponding to the block of stored intermediate cache data; execute the simplified processing profile by generating a new output data after each operation of the simplified processing profile; store the new output data from at least one operation as intermediate cache data; and transmit the new output data from a final operation in the simplified processing profile to the data source for display thereby. 2. The system of claim 1 , wherein the instructions further cause the processor to, in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, identify a location of the stored intermediate cache data among a plurality of worker nodes. 3. The system of claim 1 , wherein generating the simplified processing profile comprises removing the operation corresponding to the block of stored intermediate cache data. 4. The system of claim 1 , wherein the instructions further cause the processor to: identify a candidate worker node from among a plurality of worker nodes for storing the new output data according to a load balance calculation of at least one of storage space of each of the worker nodes and input/output bandwidth of each of the worker nodes; and store the new output data at the identified candidate worker node. 5. The system of claim 1 , wherein the instructions further cause the processor to: identify whether or not there is sufficient space among a plurality of worker nodes to store the new output data; and in response to determining there is not sufficient space among the plurality of worker nodes, clear a block of pre-stored intermediate cache data having a lower priority level than the new output data. 6. A method comprising: receiving, by a processor, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; determining, by the processor, whether or not a block of stored intermediate cache data corresponds to an operation from among of the plurality of operations; in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, removing, by the processor, the operation from the processing profile to generate a simplified processing profile, the simplified processing profile comprising a subset of the plurality of operations of the processing profile without the operation corresponding to the block of stored intermediate cache data; executing, by the processor, the simplified processing profile by generating a new output data after each operation of the simplified processing profile; storing, by the processor, the new output data from at least one operation as intermediate cache data; and transmitting, by the processor, the new output data from a final operation in the simplified processing profile to the data source for display thereby. 7. The method of claim 6 , further comprising, in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, identifying, by the processor, a location of the stored intermediate cache data among a plurality of worker nodes. 8. The method of claim 6 , wherein generating the simplified processing profile comprises removing, by the processor, the operation corresponding to the block of stored intermediate cache data. 9. The method of claim 6 , further comprising identifying, by the processor, a candidate worker node from among a plurality of worker nodes for storing the new output data according to a load balance calculation of at least one of storage space of each of the worker nodes and input/output bandwidth of each of the worker nodes. 10. The method of claim 9 , further comprising storing, by the processor, the new output data at the identified candidate worker node. 11. The method of claim 6 , further comprising: identifying, by the processor, whether or not there is sufficient space among a plurality of worker nodes to store the new output data; and in response to determining there is not sufficient space among the plurality of worker nodes, clearing, by the processor, a block of pre-stored intermediate cache data having a lower priority level than the new output data. 12. A method comprising: receiving, by a processor, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; determining, by the processor, whether or not a block of stored intermediate cache data corresponds to an operation from among of the plurality of operations; and in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, removing, by the processor, the operation from the processing profile to generate a simplified processing profile, the simplified processing profile comprising a subset of the plurality of operations of the processing profile without the operation corresponding to the block of stored intermediate cache data; executing, by the processor, one or more of the operations of the processing profile to generate a new output data after each of the executed one or more operations; storing, by the processor, the new output data from at least one of the one or more operations as intermediate cache data; and transmitting, by the processor, the new output data from a final operation from among the one or more operations to the data source for display thereby. 13. The method of claim 12 , further comprising, in response to determining the block of stored intermediate cache data corresponds to the operation from among the plurality of operations, identifying, by the processor, a location of the stored intermediate cache data among a plurality of worker nodes. 14. The method of claim 12 , wherein the method further comprises executing, by the processor, each of the plurality of operations from the subset. 15. The method of claim 12 , further comprising identifying, by the processor, a candidate worker node from among a plurality of worker nodes for storing the new output data according to a load balance calculation of at least one of storage space of each of the worker nodes and input/output bandwidth of each of the worker nodes. 16. The method of claim 15 , further comprising storing, by the processor, the new output data at the identified candidate worker node. 17. The method of claim 12 , further comprising: identifying, by the processor, whether or not there is sufficient space among a plurality of worker nodes to store the new output data; and in response to determining there is not sufficient space among

Assignees

Inventors

Classifications

  • by changing the state or mode of one or more devices · CPC title

  • Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

  • Improving I/O performance · CPC title

  • with multilevel cache hierarchies · CPC title

  • G06F16/172Primary

    Caching, prefetching or hoarding of files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452612B2 cover?
According to some example embodiments, a method includes: receiving, by a processor, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; executing, by the processor, one or more of the operations of the processing profile to generate a new output data after each of the executed one or more operations; sto…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/172. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).