Using virtual machine containers in a virtualized computing platform
US-2016098285-A1 · Apr 7, 2016 · US
US10496545B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10496545-B2 |
| Application number | US-201514950860-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 24, 2015 |
| Priority date | Nov 24, 2015 |
| Publication date | Dec 3, 2019 |
| Grant date | Dec 3, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and software described herein facilitate an enhanced service architecture for large-scale data processing. In one implementation, a method of providing data to a large-scale data processing architecture includes identifying a data request from a container in a plurality of containers executing on a host system, wherein the plurality of containers each run an instance of a large-scale processing framework. The method further provides identifying a storage repository for the data request, and accessing data associated with the data request from the storage repository. The method also includes caching the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to one of the plurality of containers.
Opening claim text (preview).
What is claimed is: 1. A service architecture for large-scale data processing, the service architecture comprising: a plurality of containers executing a large-scale processing framework on a host system; a cache service executing on the host system shared by the plurality of containers, the cache service configured to: identify a data request from a container in the plurality of containers in accordance with a first data access format; identify a storage repository for the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; access data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and cache the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to a different one of the plurality of containers, and wherein each portion of the plurality of portions comprises memory addressable by the cache service and a container associated with the portion. 2. The service architecture of claim 1 wherein the large-scale processing framework comprises a Hadoop processing framework. 3. The service architecture of claim 1 wherein the large-scale processing framework comprises a Spark processing framework. 4. The service architecture of claim 1 wherein the cache service executing on the host system is further configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers. 5. The service architecture of claim 4 wherein the cache service executing on the host system configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers is configured to allocate the plurality of portions of the cache memory to each container in the plurality of containers responsive to an assignment of a job process to the plurality of containers. 6. The service architecture of claim 1 wherein the plurality of portions of the cache memory each allocated to one of the plurality of containers comprises the plurality of portions of the cache memory each allocated to one of the plurality of containers based on a quality of service associated with each container in the plurality of containers. 7. The service architecture of claim 1 wherein the cache service is further configured to: identify a data write from the container to the storage repository; identify second data associated with the data write within the cache memory; and write the data associated with the data write to the storage repository. 8. A method of providing data to a large-scale data processing architecture, the method comprising: identifying a data request in accordance with a first data access format from a container in a plurality of containers executing on a host system, wherein the plurality of containers each run an instance of a large-scale data processing framework; identifying a storage repository for the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; accessing data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and caching the data in a portion of a cache memory on the host system allocated to the container, wherein the cache memory comprises a plurality of portions each allocated to a different one of the plurality of containers, and wherein each portion of the plurality of portions comprises memory addressable by the cache service and a container associated with the portion. 9. The method of claim 8 wherein the large-scale processing framework comprises a Hadoop processing framework. 10. The method of claim 8 wherein the large-scale processing framework comprises a Spark processing framework. 11. The method of claim 8 further comprising allocating the plurality of portions of the cache memory to each container in the plurality of containers. 12. The method of claim 11 wherein allocating the plurality of portions of the cache memory to each container in the plurality of containers comprises allocating the plurality of portions of the cache memory to each container in the plurality of containers responsive to an assignment of a job process to the plurality of containers. 13. The method of claim 11 wherein the plurality of portions of the cache memory each allocated to one of the plurality of containers comprises the plurality of portions of the cache memory each allocated to one of the plurality of containers based on a quality of service associated with each container in the plurality of containers. 14. The method of claim 8 wherein the method further comprises: identifying a data write from the container to the storage repository; identifying second data associated with the data write within the cache memory; and writing the second data associated with the data write to the storage repository. 15. An apparatus to access data for a large scale processing architecture, the apparatus comprising: one or more computer readable media; processing instructions stored on the one or more computer readable media to provide a cache service on a host system that, when executed by a processing system, direct the processing system to: identify a data request in accordance with a first data access format from a container in a plurality of containers executing on the host system, wherein the plurality of containers each run an instance of a large-scale processing framework; identify a storage repository associated with the data request from a plurality of storage repositories, wherein the plurality of storage repositories is accessible using one or more secondary data access formats; access data associated with the data request from the storage repository in accordance with a second data access format associated with the storage repository, wherein the first data access format and second data access format each comprise a file system format or data object storage format; and cache the data in cache memory for the plurality of containers, wherein the cache memory comprises an allocated portion of memory on the host system addressable by the container and the cache service. 16. The apparatus of claim 15 wherein the large-scale processing framework comprises a Hadoop processing framework. 17. The apparatus of claim 15 wherein the large-scale processing framework comprises a Spark processing framework. 18. The apparatus of claim 15 wherein the processing instructions further direct the processing system to allocate the plurality of portions of the cache memory to each container in the plurality of containers based on a quality of service associated with each container in the plurality of containers. 19. The apparatus of claim 15 wherein the processing instructions further direct the processing system to: identify a data write from the container to the storage repository; identify second data associated with the data write within the cache memory; and write the second data associated with the d
Hit rate improvement · CPC title
with a shared cache · CPC title
Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.