Data caching in a large-scale processing environment

US10496545B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10496545-B2
Application numberUS-201514950860-A
CountryUS
Kind codeB2
Filing dateNov 24, 2015
Priority dateNov 24, 2015
Publication dateDec 3, 2019
Grant dateDec 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and software described herein facilitate an enhanced service architecture for large-scale data processing. In one implementation, a method of providing data to a large-scale data processing architecture includes identifying a data request from a container in a plurality of containers executing on a host system, wherein the plurality of containers each run an instance of a large-scale processing framework. The method further provides identifying a storage repository for the data request, and accessing data associated with the data request from the storage repository. The method also includes caching the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to one of the plurality of containers.

First claim

Opening claim text (preview).

What is claimed is: 1. A service architecture for large-scale data processing, the service architecture comprising: a plurality of containers executing a large-scale processing framework on a host system; a cache service executing on the host system shared by the plurality of containers, the cache service configured to: identify a data request from a container in the plurality of containers in accordance with a first data access format; identify a storage repository for the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; access data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and cache the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to a different one of the plurality of containers, and wherein each portion of the plurality of portions comprises memory addressable by the cache service and a container associated with the portion. 2. The service architecture of claim 1 wherein the large-scale processing framework comprises a Hadoop processing framework. 3. The service architecture of claim 1 wherein the large-scale processing framework comprises a Spark processing framework. 4. The service architecture of claim 1 wherein the cache service executing on the host system is further configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers. 5. The service architecture of claim 4 wherein the cache service executing on the host system configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers is configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers responsive to an assignment of a job process to the plurality of containers. 6. The service architecture of claim 1 wherein the plurality of portions of the cache memory each allocated to one of the plurality of containers comprises the plurality of portions of the cache memory each allocated to one of the plurality of containers based on a quality of service associated with each container in the plurality of containers. 7. The service architecture of claim 1 wherein the cache service is further configured to: identify a data write from the container to the storage repository; identify second data associated with the data write within the cache memory; and write the data associated with the data write to the storage repository. 8. A method of providing data to a large-scale data processing architecture, the method comprising: identifying a data request in accordance with a first data access format from a container in a plurality of containers executing on a host system, wherein the plurality of containers each run an instance of a large-scale data processing framework; identifying a storage repository for the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; accessing data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and caching the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to a different one of the plurality of containers, and wherein each portion of the plurality of portions comprises memory addressable by the cache service and a container associated with the portion. 9. The method of claim 8 wherein the large-scale processing framework comprises a Hadoop processing framework. 10. The method of claim 8 wherein the large-scale processing framework comprises a Spark processing framework. 11. The method of claim 8 further comprising allocating the plurality of portions of the cache memory to each container in the plurality of containers. 12. The method of claim 11 wherein allocating the plurality of portions of the cache memory to each container in the plurality of containers comprises allocating the plurality of portions of the cache memory to each container in the plurality of containers responsive to an assignment of a job process to the plurality of containers. 13. The method of claim 11 wherein the plurality of portions of the cache memory each allocated to one of the plurality of containers comprises the plurality of portions of the cache memory each allocated to one of the plurality of containers based on a quality of service associated with each container in the plurality of containers. 14. The method of claim 8 wherein the method further comprises: identifying a data write from the container to the storage repository; identifying second data associated with the data write within the cache memory; and writing the second data associated with the data write to the storage repository. 15. An apparatus to access data for a large scale processing architecture, the apparatus comprising: one or more computer readable media; processing instructions stored on the one or more computer readable media to provide a cache service on a host system that, when executed by a processing system, direct the processing system to: identify a data request in accordance with a first data access format from a container in a plurality of containers executing on the host system, wherein the plurality of containers each run an instance of a large-scale processing framework; identify a storage repository associated with the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; access data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and cache the data in cache memory for the plurality of containers, wherein the cache memory comprises an allocated portion of memory on the host system addressable by the container and the cache service. 16. The apparatus of claim 15 wherein the large-scale processing framework comprises a Hadoop processing framework. 17. The apparatus of claim 15 wherein the large-scale processing framework comprises a Spark processing framework. 18. The apparatus of claim 15 wherein the processing instructions further direct the processing system to allocate the plurality of portions of the cache memory to each container in the plurality of containers based on a quality of service associated with each container in the plurality of containers. 19. The apparatus of claim 15 wherein the processing instructions further direct the processing system to: identify a data write from the container to the storage repository; identify second data associated with the data write within the cache memory; and write the second data associated with the d

Assignees

Inventors

Classifications

  • Hit rate improvement · CPC title

  • with a shared cache · CPC title

  • Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10496545B2 cover?
Systems, methods, and software described herein facilitate an enhanced service architecture for large-scale data processing. In one implementation, a method of providing data to a large-scale data processing architecture includes identifying a data request from a container in a plurality of containers executing on a host system, wherein the plurality of containers each run an instance of a larg…
Who is the assignee on this patent?
Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06F12/0868. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).