Streaming joins in constrained memory environments

US2016357476A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016357476-A1
Application numberUS-201514732374-A
CountryUS
Kind codeA1
Filing dateJun 5, 2015
Priority dateJun 5, 2015
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Large amounts of memory can be consumed in streaming joins because events from one stream are held in memory while waiting for matching events from a second stream. Memory needs can be reduced by analyzing the join condition to determine the bounds on the time discrepancy between events in the two streams. When it is determined that an event from one stream must occur prior to the matching event from the other stream, the later-arriving stream data can be ingested with an intentional delay. When it is determined that regardless of input received from a first stream, no output will be produced when there is no input from the second stream, pulling data from the first stream can cease. A multi-stage join plan can be employed so that a less busy stream can be scanned with increasing amounts of intentional delay. Only unmatched data is stored.

First claim

Opening claim text (preview).

What is claimed: 1 . A system comprising: at least one processor: a memory connected to the at least one processor; and at least one program module loaded into the memory, the at least one program module comprising a streaming join module that: analyzes at least one join condition for joining data pulled from a first data stream and data pulled from a second data stream; and generates, based on analysis of the at least one join condition, an order in which pull requests are sent to the first and second data streams based on a time associated with the data from the first data stream and a time associated with the data from the second data stream. 2 . The system of claim 1 , further comprising at least one program module that: generates an allowable time range for joinable data from the analysis of the at least one join condition; generates a lower boundary of the allowable time range, the lower boundary comprising a minimal time variance for the joinable data; pulls data from the first data stream; records in the memory the time associated with the data from the first data stream; and pulls data from the second data stream. 3 . The system of claim 2 , further comprising at least one program module that: determines that the pulled data from the second data stream is potentially joinable with the pulled data from the first data stream by determining that the time associated with the data from the second stream is less than a difference of the recorded time and the minimal time variance. 4 . The system of claim 2 , further comprising at least one program module that: pulls additional data from the first data stream in response to determining that the pulled data from the second data stream is not joinable with the pulled data from the first data stream; and determines that the pulled data from the second data stream is not joinable with the pulled data from the first data stream by determining that the time associated with the data from the second stream is not less than a difference of the recorded time and the minimal time variance. 5 . The system of claim 1 , further comprising at least one program module that: determines from the analysis of the at least one join condition that to produce output, data from the first data stream and data from the second data stream is required. 6 . The system of claim 5 , further comprising at least one program module that: in response to determining that no output can be produced because no potentially joinable data from the second data stream has been pulled, stops pulling data from the first data stream and starts pulling data from the second data stream. 7 . The system of claim 5 , further comprising at least one program module that: in response to determining that no output can be produced because no potentially joinable data from the first data stream has been pulled, stops pulling data from the second data stream and starts pulling data from the first data stream. 8 . The system of claim 1 , further comprising at least one program module that: determines from the analysis of the at least one join condition, a first time interval and a second time interval; and in response to determining that the time associated with the data from the first data stream and the time associated with the data from the second data stream does fall within the first time interval, retains non-matching data from the second data stream. 9 . The system of claim 1 , further comprising at least one program module that: determines from the analysis of the at least one join condition a first time interval and a second time interval; and in response to determining that the time associated with the data from the first data stream and the time associated with recorded data from the second data stream fall within the second time interval, discards non-matching data from the second data stream. 10 . A method comprising: analyzing by a processor of a computing device at least one join condition for joining data pulled from a first data stream and data pulled from a second data stream in a stream processing system; and generating, based on analysis of the at least one join condition, an order in which pull requests are sent to the first and second data streams based on a time associated with the data from the first data stream and a time associated with the data from the second data stream. 11 . The method of claim 10 , further comprising: generating an allowable time range for joinable data from the analysis of the at least one join condition; generating a lower boundary of the allowable time range, the lower boundary comprising a minimal time variance for the joinable data; pulling data from the first data stream; recording in the memory the time associated with the data from the first data stream; and pulling data from the second data stream. 12 . The method of claim 11 , further comprising: determining that the pulled data from the second data stream is potentially joinable with the pulled data from the first data stream by determining that the time associated with the data from the second stream is less than a difference of the recorded time and the minimal time variance. 13 . The method of claim 11 , further comprising: pulling additional data from the first data stream in response to determining that the pulled data from the second data stream is not joinable with the pulled data from the first data stream; and determining that the data pulled from the second data stream is not joinable with the pulled data from the first data stream by determining that the time associated with the data from the second stream is not less than a difference of the recorded time and the minimal time variance. 14 . The method of claim 10 , further comprising: determining from the analysis of the at least one join condition that to produce output, data from the first data stream and data from the second data stream is required. 15 . The method of claim 14 , further comprising: in response to determining that no output can be produced because no potentially joinable data from the second data stream has been pulled, ceasing to pull data from the first data stream and starting to pulls data from the second data stream with an intentional delay. 16 . The method of claim 14 , further comprising: in response to determining that no output can be produced because no potentially joinable data from the first data stream has been pulled, ceasing to pulling data from the second data stream and pulling data from the first data stream. 17 . The method of claim 10 , further comprising: determining from the analysis of the at least one join condition, a first time interval and a second time interval; and in response to determining that a time associated with the data from the first data stream and the time associated with the data from the second data stream fall within the first time interval, retain non-matching data from the second data stream. 18 . The method of claim 17 , further comprising: determining from the analysis of the at least one join condition a first time interval and a second time interval; and in response to determining that a time associated with the data from the first data stream and the time associated with recorded data from the second data stream fall within the second time interval, discarding non-matching data from the second data stream. 19 . A computer-readable storage medium comprising computer-readable instructions which when executed

Assignees

Inventors

Classifications

  • G06F16/21Primary

    Design, administration or maintenance of databases · CPC title

  • Monitoring storage devices or systems · CPC title

  • G06F3/0638Primary

    Organizing or formatting or addressing of data · CPC title

  • Single storage device · CPC title

  • G06F3/0604Primary

    Improving or facilitating administration, e.g. storage management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016357476A1 cover?
Large amounts of memory can be consumed in streaming joins because events from one stream are held in memory while waiting for matching events from a second stream. Memory needs can be reduced by analyzing the join condition to determine the bounds on the time discrepancy between events in the two streams. When it is determined that an event from one stream must occur prior to the matching even…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/21. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).