Method and apparatus for processing exploding data stream

US10061858B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10061858-B2
Application numberUS-201514606084-A
CountryUS
Kind codeB2
Filing dateJan 27, 2015
Priority dateFeb 5, 2014
Publication dateAug 28, 2018
Grant dateAug 28, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a method and apparatus for processing a data stream capable of providing a data flow processing language to support real-time processing of an exploding data stream and providing an environment for executing the data flow processing language in a cluster system. The data flow-based exploding data stream processing method includes receiving a big data real-time processing service described in a real-time data flow language, interpreting the big data real-time processing service to generate a distributed stream processing service, and distributively deploying the distributed stream processing service in a cluster system including multiple nodes and configuring an execution environment for executing the distributed stream processing service in each node.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executable method for data flow-based exploding data stream processing in a cluster system including multiple nodes, the method comprising: receiving a service script described in a real-time data flow language for processing a tuple-based data stream, the real-time data flow language including a plurality of basic operators defining processing of a tuple and a tuple aggregation, and a plurality of field operators defining processing of fields forming the tuple, the service script including instructions corresponding to a type of an input/output source, a tuple unit input/output method, and a window function defining a unit to be processed in the data stream, by using the basic operators or by delivering a value to one of the basic operators; interpreting the service script to configure a logical plan, which includes a plurality of logical nodes corresponding respectively to the plurality of basic operators; optimizing the logical plan to generate a directed acyclic graph (DAG)-based distributed stream processing service; distributively deploying the distributed stream processing service in the cluster system and configuring an execution environment for executing the distributed stream processing service in each of the multiple nodes; and collecting state information of the nodes forming the cluster system and state information regarding a task of the distributed stream processing service processed in the nodes; and redeploying the distributed stream processing service based on the state information. 2. The method of claim 1 , wherein the configuring of a logical plan further comprises expressing relationships between the basic operators as links of the logical nodes, and setting the field operators to a relationship in which the field operators are dependent upon the basic operators. 3. The method of claim 2 , wherein the converting into a DAG-based distributed stream processing service comprises converting the logical nodes and the links forming the logical plan into the DAG-based distributed stream processing service by using an application programming interface (API) provided from a distributed processing system. 4. The method of claim 1 , wherein the configuring of an execution environment comprises distributively deploying tasks forming the distributed stream processing service in the multiple nodes based on resource conditions of the cluster system. 5. The method of claim 4 , wherein the distributed stream processing service is a service in which a relationship between external input/output tasks defining a task of receiving a data stream from an external source and providing the received data stream in units of window or a task of outputting a data stream to an external source and stream processing tasks defining a task of receiving and processing a data stream is expressed based on DAG. 6. A data flow-based exploding data stream processing system, comprising: a hardware processor, and a non-transitory medium having program instructions stored thereon, execution of which by the hardware processor causes the processing system to provide functions of: a data flow language converting module configured to interpret a service script described in a real-time data flow language, the real-time data flow language including a plurality of basic operators defining processing of a tuple and a tuple aggregation, and a plurality of field operators defining processing of fields forming the tuple, the service script including instructions corresponding to a type of an input/output source, a tuple unit input/output method, and a window function defining a unit to be processed in the data stream, by using the basic operators or by delivering a value to one of the basic operators; and configure a logical plan that includes a plurality of logical nodes corresponding respectively to the plurality of basic operators; a distributed stream service generating module configured to optimize the logical plan and convert the optimized logical plan into a directed acrylic graph (DAG)-based distributed stream processing service; and a distributed stream processing service managing module configured to distributively deploy the distributed stream processing service in a cluster system including multiple nodes and configure an execution environment for executing the distributed stream processing service in each of the multiple nodes, the distributed stream processing service managing module collecting state information of the nodes forming the cluster system and state information regarding a task of the distributed stream processing service processed in the nodes, and redeploying the distributed stream processing service based on the state information. 7. The system of claim 6 , wherein the data flow language converting module further expresses relationships between the basic operators as links of the logical nodes, and sets the field operators to a relationship in which the field operators are dependent upon the basic operators. 8. The system of claim 7 , wherein the distributed stream service generating module converts the logical nodes and the links forming the logical plan into the DAG-based distributed stream processing service by using an application programming interface (API) provided from a distributed processing system. 9. The system of claim 6 , wherein the distributed stream processing service managing module distributively deploys tasks forming the distributed stream processing service in the multiple nodes based on resource conditions of the cluster system. 10. The system of claim 6 , wherein the distributed stream processing service is a service in which a relationship between external input/output tasks defining a task of receiving a data stream from an external source and providing the received data stream in units of window or a task of outputting a data stream to an external source and stream processing tasks defining a task of receiving and processing a data stream is expressed based on DAG.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Data stream processing; Continuous queries · CPC title

  • G06F16/955Primary

    using information identifiers, e.g. uniform resource locators [URL] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10061858B2 cover?
Provided are a method and apparatus for processing a data stream capable of providing a data flow processing language to support real-time processing of an exploding data stream and providing an environment for executing the data flow processing language in a cluster system. The data flow-based exploding data stream processing method includes receiving a big data real-time processing service de…
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst
What technology area does this patent fall under?
Primary CPC classification G06F17/30876. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 28 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).