Distributed stream processing in the cloud

US9641580B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9641580-B2
Application numberUS-201414320706-A
CountryUS
Kind codeB2
Filing dateJul 1, 2014
Priority dateJul 1, 2014
Publication dateMay 2, 2017
Grant dateMay 2, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based query language. A script written in the extended query language can be translated into an optimal parallel continuous execution plan. Execution of the plan can be orchestrated by a streaming job manager which schedules vertices on available computing machines. The streaming job manager can monitor overall job execution. Fault tolerance can be provided by tracking execution progress and data dependencies in each vertex. In the event of a failure, another instance of the failed vertex can be scheduled. An optimal recovery point can be determined based on checkpoints and data dependencies.

First claim

Opening claim text (preview).

What is claimed: 1. A system comprising: at least one processor: a memory connected to the at least one processor; and a cloud-scale query execution platform that supports distributed stream processing comprising a streaming job manager that monitors execution information about streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, the streaming job manager receiving execution progress information and data dependencies for the plurality of vertices, each vertex of the plurality of vertices configured to process events associated with one or more streaming jobs, and each event being labeled with a sequence number used by at least the streaming job manager to describe and track dependencies between input, output and state of a vertex. 2. The system of claim 1 , the streaming job manager scheduling a new vertex in response to detecting a failed vertex of the plurality of vertices, the streaming job manager determining a closest checkpoint from which to resume processing on the new vertex. 3. The system of claim 2 , the streaming job manager calculating a minimum sequence number of event sequence numbers from which the new vertex resumes processing. 4. The system of claim 1 , further comprising a script processor that receives a script written in a declarative query language, the declarative query language supporting distributed stream processing through temporal analysis of input event streams. 5. The system of claim 1 , further comprising a streaming execution plan optimizer that receives a compiled script written in a declarative query language, the declarative query language having a capability to receive user-defined functions to consume event streams. 6. The system of claim 1 , wherein the sequence number is assigned from a monotonically increasing sequence to an event of a plurality of events in an event stream. 7. The system of claim 1 , wherein the execution platform assigns the sequence number to each of the events in an event stream. 8. A method comprising: receiving by a processor of a computing device, execution progress information associated with a plurality of streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, each vertex of the plurality of vertices configured to process events associated with one or more of the plurality of streaming jobs, and each event being assigned a sequence number that describes and tracks dependencies between input, output and state of at least one vertex of the plurality of vertices; in response to detecting a vertex failure among the plurality of vertices, scheduling a new vertex; and determining a closest checkpoint from which to resume processing on the new vertex from the sequence numbers assigned to the events in the streaming jobs. 9. The method of claim 8 , further comprising: performing failure recovery by calculating a minimum sequence number of event sequence numbers from which the new vertex resumes processing. 10. The method of claim 8 , further comprising: receiving a script in a query language extended to support distributed stream processing through temporal analysis of event streams; and generating an optimized streaming execution plan from the script, the script comprising a stream extractor that converts information from a continuous input source into event streams. 11. The method of claim 8 , further comprising: receiving a script in a query language extended to support distributed stream processing through temporal analysis of event streams; and generating an optimized streaming execution plan from the script, the script comprising a stream outputter that performs user-defined actions processing streaming output events. 12. The method of claim 8 , further comprising: receiving the sequence number associated with a last consumed or a last produced event from a vertex of the plurality of vertices. 13. The method of claim 8 , wherein the sequence numbers are monotonically increasing sequence numbers. 14. A computer-readable storage medium comprising computer-readable instructions which when executed cause at least one processor of a computing device to: receive data dependency information associated with a plurality of streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, each vertex of the plurality of vertices configured to process events associated with one or more of the plurality of streaming jobs, and each event being assigned a sequence number that describes and tracks dependencies between input, output and state of at least one vertex of the plurality of vertices; in response to detecting a vertex failure among the plurality of vertices, perform job recovery by scheduling a new vertex; and determine a closest checkpoint from which to resume processing on the new vertex using the sequence numbers assigned to the events in one or more of the plurality of streaming jobs. 15. The computer-readable storage medium of claim 14 , comprising further computer-readable instructions which when executed cause the at least one processor to: calculate a minimum sequence number of event sequence numbers from which the new vertex resumes processing based on stored checkpointing data. 16. The computer-readable storage medium of claim 15 , comprising further computer-readable instructions which when executed cause the at least one processor to: generate an optimized streaming execution plan from a script written in a query language extended to support distributed stream processing through temporal analysis of input event streams. 17. The computer-readable storage medium of claim 14 , comprising further computer-readable instructions which when executed cause the at least one processor to: generate an optimized streaming execution plan from a script written in a query language having a capability to receive user-defined functions to consume event streams. 18. The computer-readable storage medium of claim 14 , comprising further computer-readable instructions which when executed cause the at least one processor to: generate an optimized streaming execution plan from a script written in a query language having a capability to receive user-defined functions to produce event streams. 19. The computer-readable storage medium of claim 14 , comprising further computer-readable instructions which when executed cause the at least one processor to: receive execution progress information comprising last event processed and last event produced from a vertex of the plurality of vertices. 20. The computer-readable storage medium of claim 19 , comprising further computer-readable instructions which when executed cause the at least one processor to: assign a monotonically increasing sequence number to each event in an event stream.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9641580B2 cover?
A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based …
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04L65/60. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 02 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).