Distributed stream processing in the cloud

US11271981B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11271981-B2
Application numberUS-201916249357-A
CountryUS
Kind codeB2
Filing dateJan 16, 2019
Priority dateJul 1, 2014
Publication dateMar 8, 2022
Grant dateMar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based query language. A script written in the extended query language can be translated into an optimal parallel continuous execution plan. Execution of the plan can be orchestrated by a streaming job manager which schedules vertices on available computing machines. The streaming job manager can monitor overall job execution. Fault tolerance can be provided by tracking execution progress and data dependencies in each vertex. In the event of a failure, another instance of the failed vertex can be scheduled. An optimal recovery point can be determined based on checkpoints and data dependencies.

First claim

Opening claim text (preview).

What is claimed: 1. A computing device, comprising: at least one processor: at least one memory connected to the at least one processor; and a distributed stream processing system that is at least partially stored in the at least one memory and executed by the at least one processor, the distributed stream processing system comprising a streaming job manager configured to monitor execution information about streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, each vertex of the plurality of vertices configured to process events associated with one or more streaming jobs, the plurality of vertices including a user-defined stream extractor vertex configured to consume events of one or more streams, receive execution progress information for the plurality of vertices, detect, based on the monitored execution information, a failed vertex of the plurality of vertices, and restart the failed vertex. 2. The computing device of claim 1 , wherein the stream extractor vertex is configured to continually wait for and perform computations on data received in the one or more event streams. 3. The computing device of claim 2 , wherein the stream extractor vertex is configured to indicate temporal information for an event. 4. The computing device of claim 3 , wherein the temporal information includes at least one of a time the event began, a time the event ended, a time period during which the event was active, or sequence information for the event. 5. The computing device of claim 3 , further comprising: a garbage collection process configured to garbage collect the event based on a sequence number of the event. 6. The computing device of claim 1 , wherein the plurality of vertices includes a stream outputter vertex configured to perform one or more user-defined actions processing one or more streaming output events. 7. A method in a distributed stream processing system implemented in at least one computing device, comprising monitoring execution information about streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, each vertex of the plurality of vertices configured to process events associated with one or more streaming jobs, the plurality of vertices including a stream extractor vertex; enabling a user to define the stream extractor vertex; continually waiting for, by the stream extractor vertex, data received in one or more event streams; detecting, based on the monitored execution information, a failed vertex of the plurality of vertices; and restarting the failed vertex. 8. The method of claim 7 , wherein the plurality of vertices includes a stream extractor vertex, the method further comprising: consuming, by the stream extractor vertex, events of the one or more event streams; and performing computations on the data received in the one or more event streams. 9. The method of claim 7 , further comprising: configuring the stream extractor vertex to indicate temporal information for an event. 10. The method of claim 9 , wherein the temporal information includes at least one of a time the event began, a time the event ended, a time period during which the event was active, or sequence information for the event. 11. The method of claim 9 , further comprising: garbage collecting the event based on a sequence number of the event. 12. The method of claim 7 , wherein the plurality of vertices includes a stream outputter vertex, the method further comprising: performing one or more user-defined actions with the stream outputter vertex to process one or more streaming output events. 13. A computer-readable storage medium having program instructions recorded thereon that, when executed by at least one processing circuit, perform a method for distributed stream processing, the method comprising: monitoring execution information about streaming jobs executed by a plurality of vertices executing on a plurality of computing devices, each vertex of the plurality of vertices configured to process events associated with one or more streaming jobs; and enabling consumption of an output event stream by an application, the consumption of the output event stream defined by a user. 14. The computer-readable storage medium of claim 13 , wherein the plurality of vertices includes a stream extractor vertex, the method further comprising: consuming, by the stream extractor vertex, events of one or more event streams, including continually waiting for and performing computations on data received in the one or more event streams. 15. The computer-readable storage medium of claim 14 , wherein the method further comprises: configuring the stream extractor vertex to indicate temporal information for an event. 16. The computer-readable storage medium of claim 15 , wherein the temporal information includes at least one of a time the event began, a time the event ended, a time period during which the event was active, or sequence information for the event. 17. The computer-readable storage medium of claim 15 , wherein the method further comprises: garbage collecting the event based on a sequence number of the event. 18. The computer-readable storage medium of claim 14 , wherein the method further comprises: enabling a user to define the stream extractor vertex. 19. The computer-readable storage medium of claim 13 , the method further comprising: detecting, based on the monitored execution information, a failed vertex of the plurality of vertices; and restarting the failed vertex. 20. The computer-readable storage medium of claim 13 , wherein the plurality of vertices includes a stream outputter vertex, the method further comprising: performing one or more user-defined actions with the stream outputter vertex to process one or more streaming output events.

Assignees

Inventors

Classifications

  • Restarting or rejuvenating · CPC title

  • H04L65/60Primary

    Network streaming of media packets · CPC title

  • Plan optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11271981B2 cover?
A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/1438. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).