Pluggable storage system for parallel query engines across non-native file systems

US10831709B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10831709-B2
Application numberUS-201815961627-A
CountryUS
Kind codeB2
Filing dateApr 24, 2018
Priority dateFeb 25, 2013
Publication dateNov 10, 2020
Grant dateNov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, article of manufacture, and apparatus for managing data. In some embodiments, this includes receiving a query from a client, based on the received query, analyzing a catalog for location information, based on the analysis, determining a first storage system, an associated first file system, an associated first protocol translator, a second storage system, an associated second file system, and an associated second protocol translator, identifying a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, running a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system, and running a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, by one or more processors, a query from a client via one or more networks; determining, by one or more processors, a first storage system of a plurality of storage systems, and a second storage system of the plurality of storage systems, wherein: the determining of the first storage system and the second storage system comprises determining the first storage system and the second storage system based at least in part on the query and a catalog, which stores mappings of file names and file locations, for location information; a file is moved from the first storage system to the second storage system based at least in part on a usage level of the file, and in response to the file being moved, the catalog is updated with a new location information for the file; the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems; and a first file stored on the first storage system and a second file stored on the second storage system are identified as having a location in the single namespace in a manner in which a location of the first file on the first storage system and location of the second file on the second storage system are transparent to the client; determining by one or more processors, a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, and a first portion of the query is performed on the first storage system and a second portion of the query is performed on the second storage system; running, by one or more processors, a first job on the first data; and running, by one or more processors, a second job on the second data. 2. The method of claim 1 , wherein the location information stored in connection with the catalog indicates a storage system on which the file is located among the plurality of storage systems. 3. The method of claim 1 , wherein the first storage system is different from the second storage system, and a first protocol used in connection with communication with the first storage system is different from a second protocol used in connection with communication with the second storage system. 4. The method of claim 1 , further comprising determining a first file system associated with the first storage system, a first protocol translator to use in connection with communication with the first storage system, a second file system associated with the second storage system, and a second protocol translator to use in connection with communication with the second storage system. 5. The method of claim 4 , wherein the first job is run using the first protocol translator, and the first job is not a native job of the first file system. 6. The method of claim 4 , wherein the first protocol translator is stored on the first storage system. 7. The method claim 4 , wherein the second protocol translator is stored on the second storage system. 8. The method of claim 4 , further comprising running the first job on the second data. 9. The method of claim 8 , further comprising running the second job on the first data. 10. The method of claim 8 , wherein the first job is not a native job of the second file system. 11. The method claim 10 , wherein the second job is not a native job of the first file system. 12. The method of claim 4 , wherein the first protocol translator and the second protocol translator are used by the universal namenode to respectively communicate with the first storage system and the second storage system, and the universal namenode is associated with the plurality of storage systems and is used in connection with processing the query. 13. The method of claim 1 , wherein the universal namenode that is associated with the plurality of storage systems. 14. The method of claim 13 , wherein the universal namenode serves as a domain that unifies respective domains of the plurality of storage systems, and the query does not specify the respective domains of corresponding ones of the plurality of storage systems associated with data relating to the query. 15. The method of claim 1 , wherein the first portion of the query includes running the first job on the first data, and the second portion of the query includes running the second job on the second data. 16. The method of claim 1 , wherein the first storage system and the second storage system reside under the universal namenode. 17. The method of claim 1 , further comprising: in response to determining that the file is moved from the first storage system to the second storage system, updating an entry in the catalog corresponding to the file to indicate a location of the file as being the second storage system. 18. The method of claim 1 , wherein the universal namenode tracks a status of the first job and the second job that are respectively associated with the query. 19. The method of claim 1 , wherein a response to the query is provided to the client, and the response to the query is presented as the single namespace corresponding to a namespace of the universal namenode. 20. A system, comprising a processor configured to: receive a query from a client via one or more networks; determine a first storage system of a plurality of storage systems, an associated first file system, and a second storage system of the plurality of storage systems, wherein: to determine of the first storage system and the second storage system comprises determining the first storage system and the second storage system based at least in part on the query and a catalog, which stores mappings of file names and file locations, for location information; a file is moved from the first storage system to the second storage system based at least in part on a usage level of the file, and in response to the file being moved, the catalog is updated with a new location information for the file; the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems; and a first file stored on the first storage system and a second file stored on the second storage system are identified as having a location in the single namespace in a manner in which a location of the first file on the first storage system and location of the second file on the second storage system are transparent to the client; determine a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system; run a first job on the first data; and run a second job on the second data. 21. A computer program product, comprising a non-transitory computer readable medium having program instructions embodied therein for: receiving, by one or more processors, a query from a client via one or more networks; determining, by one or more processors, a first storage system of a plurality of storage systems, and a second storage system of the plurality of storage systems, wherein: the determining of the first storage system and the second storage system comprises determining the first storage system and the second storage system based at least in part on the query and a catalog, which stores mappings of file names and file locations, for location information; a file is moved from the first storage system to the second storage system based at least in part on a u

Assignees

Inventors

Classifications

  • G06F16/148Primary

    File search processing · CPC title

  • Distributed file systems · CPC title

  • Query rewriting; Transformation · CPC title

  • File systems; File servers · CPC title

  • Access plan code generation and invalidation; Reuse of access plans · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10831709B2 cover?
A method, article of manufacture, and apparatus for managing data. In some embodiments, this includes receiving a query from a client, based on the received query, analyzing a catalog for location information, based on the analysis, determining a first storage system, an associated first file system, an associated first protocol translator, a second storage system, an associated second file sys…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/148. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).