System and method for providing a modern-era retrospective analysis for research and applications (MERRA) data analytic service

US10339114B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10339114-B2
Application numberUS-201514711137-A
CountryUS
Kind codeB2
Filing dateMay 13, 2015
Priority dateMay 13, 2015
Publication dateJul 2, 2019
Grant dateJul 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method and computer-readable storage devices for providing an interface for an analytic service for Modern-Era Retrospective Analysis for Research and Applications (MERRA) datasets. An example system for providing the service includes a data analytics platform of an assemblage of compute and storage nodes that provide a compute-storage fabric upon which high-performance parallel operations are performed over a collection of climate data stored in a distributed file system, a sequencer that transforms the climate data, a desequencer that transforms serialized block compressed sequence files between data formats. The system includes a services library of applications that dynamically create data objects from the data as reduced final results, and a utilities library of software applications that process flat serialized block compressed sequence files. The system also includes a service interface through which a client device can access the climate data via the data analytics platform.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a data analytics platform comprising an assemblage of compute and storage nodes that provide a compute-storage fabric upon which high-performance parallel operations are performed over a collection of climate data stored in a distributed file system; a hardware sequencer that transforms the climate data encoded in a native model output file format to yield flat serialized block compressed sequence files and loads the flat serialized block compressed sequence files into the distributed file system by a calling application requesting by an order service request to said data analytics platform via a system interface through which a client device can access the climate data via the data analytics platform indicating operation to be performed and specific predetermined parameters that further specify the order service request, wherein the service interface maps the incoming service request to a first order module, which launches an operation as a MapReduce computation on the data analytic platform and returns a session identifier (ID) through the interface to the calling application; wherein once the order request is launched, the calling application issues status service requests wherein the session ID monitors progress of the order request and the system interface maps a status request to the appropriate call to a services library and receives a status update, which the system interface passes back to the calling application; a hardware desequencer that transforms the flat serialized block compressed sequence files from the native model output file format to a second climate data file format and moves data stored in the second climate data file format out of the distributed file system and is prepared for retrieval by the calling application as a separate file where the calling application submits a download service request via the system interface mapped to the services library; a services library comprising a plurality of software applications that dynamically create data objects from the data stored in the second climate data file format as reduced final results; and a utilities library comprising a plurality of software applications that can process the flat serialized block compressed sequence files whereby the services library returns the data which the system interface relays to the calling application and the calling application sends climate data via the analytics platform through a client device to an end user; whereby the compute and storage nodes comprise a processor configured as containing multiple cores or processors, a bus, memory controller, cache, including multiple distributed processors located in multiple separate computing devices working together via a communications network sharing resources such as memory and the cache or operating using independent resources configured from of an application specific integrated circuit (ASIC), or a programmable gate array (PGA) including a field PGA utilizing a system bus selected from one of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus connected to storage devices such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive, solid-state drive, RAM drive, removable storage devices, and a redundant array of inexpensive disks (RAID), hybrid storage device. 2. The system of claim 1 , wherein the parallel operations comprise MapReduce operations, wherein the distributed file system comprises a Hadoop file system, and wherein the climate data comprises Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate data. 3. The system of claim 2 , wherein the sequencer transforms native model output file format files to yield sequence files as Hadoop input files. 4. The system of claim 3 , wherein the input files are in at least one of Bloom, Sequence, and Map formats. 5. The system of claim 2 , wherein the sequencer partitions native MERRA data files by time such that each record in a sequence file contains a timestamp and a MERRA climate variable name which are a composite key of a MapReduce key value pair, and wherein the MERRA climate variable name functions as a value of the key value pair. 6. The system of claim 5 , wherein the MERRA climate variable name comprises one to three spatial dimensions. 7. The system of claim 1 , wherein the sequencer performs operations further comprising: reading and writing sequence files; dynamically determining a file format for the sequence files at run time; providing access to a main climate variable, associated ancillary variables, and associated metadata contained in sequence files; storing additional metadata for the main climate variable in sequence files; and parsing command line arguments and configuration file settings to control the sequencer. 8. The system of claim 1 , wherein the climate data is encoded in one of Network Common Data Format or Hierarchical Data Format. 9. The system of claim 1 , wherein the sequence files are encoded in one of Bloom, Sequence, or Map file formats. 10. The system of claim 2 , wherein the utilities library further comprises: a sorting application that sorts key value pairs of the sequence files by time and grouped by a main variable field; a comparing application that compares variable name and associated timestamps of the key value pairs, and sorts operations over the <key, value> pairs by comparing variable name and grouping variables by variable name; a partitioning application that partitions results from a mapper based on a variable name across a plurality of reducer applications, enabling parallel execution of the reducer applications; a simplifying application that simplifies sequencing and desequencing operations by abstracting operations on the key value pairs from a main code of a MapReduce software application; and a managing application that manages configuration files required to execute MapReduce software applications. 11. The system of claim 1 , wherein the services library comprises operations corresponding to the International Standards Organization Open Archival Information System Reference Model data flow categories for an operational archive, the operations comprising: ingest operations to input data objects to the system; query operations that retrieve metadata relating to data objects in the system; order operations that dynamically create data objects in the system; download operations that retrieve data objects from a system; execute operations that initiate service-definable operations; and status operations that check on the progress of order operations. 12. The system of claim 1 , wherein the services library provides an order operation that comprises a GetVariableByCollection_Operation_TimeRange_SpatialExtent_VerticalExtent method that performs operations comprising: a maximum operation that determines the maximum value of a climate variable according to user-specified input parameters; a minimum operation that determines the minimum value of a climate variable according to user-specified input parameters; a sum operation that determines the sum of the values of a climate variable according to user-specified input parameters; a count operation that determines the number of instances of a climate variable according to user-specified input parameters; an average operation that determines the arithmetic mean of a set of climate variables according to user-specified input parameters; a variance operation that determines the variance of the mean for a set of a climate variables according to user-specified input parameters; and a difference operat

Assignees

Inventors

Classifications

  • G06F16/211Primary

    Schema design and management · CPC title

  • Distributed file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10339114B2 cover?
A system, method and computer-readable storage devices for providing an interface for an analytic service for Modern-Era Retrospective Analysis for Research and Applications (MERRA) datasets. An example system for providing the service includes a data analytics platform of an assemblage of compute and storage nodes that provide a compute-storage fabric upon which high-performance parallel opera…
Who is the assignee on this patent?
The United States Of America Represented By The Administrator Of The Nat Aeronautics And Space Admin, Nasa
What technology area does this patent fall under?
Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).