Methods and apparatus for adaptive source filtering and load shedding for data stream processing

US9158837B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9158837-B2
Application numberUS-87009907-A
CountryUS
Kind codeB2
Filing dateOct 10, 2007
Priority dateOct 10, 2007
Publication dateOct 13, 2015
Grant dateOct 13, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for adaptive source filtering and load shedding in such data stream processing systems. For example, in one aspect of the invention, a method for use in filtering data in a distributed data stream processing system, wherein a server receives and processes one or more data streams from one or more data sources, comprises the steps of the server periodically re-configuring one or more filters and sending the one or more periodically re-configured filters to the one or more data sources, and the one or more data sources performing data filtering based on the one or more periodically re-configured filters received from the server.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for use in filtering data in a distributed data stream processing system, wherein a server receives and processes one or more data streams from one or more data sources, the method comprising the steps of: the server periodically re-configuring one or more filters and sending the one or more periodically re-configured filters to the one or more data sources such that the one or more data sources are configured to perform data filtering based on the one or more periodically re-configured filters received from the server; wherein the step of the server periodically re-configuring the one or more filters further comprises the steps of: defining a plurality of non-overlapping regions of a data space associated with at least one of the one or more filters; and determining a respective drop threshold for each of the non-overlapping regions within the filter, at least a first region having a different drop threshold than at least a second region, wherein a drop threshold defines a probability that an item will be dropped at a data source when the filter is applied. 2. The method of claim 1 , wherein the step of the server defining non-overlapping regions of the data space for the filter further comprises the step of partitioning the data space of the filter into a given number of regions. 3. The method of claim 2 , wherein the given number of regions are equally sized. 4. The method of claim 2 , wherein, for the given number of regions, a data frequency is the same. 5. The method of claim 2 , wherein, for the given number of regions, a query density is the same. 6. The method of claim 2 , wherein, for the given number of regions, a partition benefit is maximized. 7. The method of claim 1 , wherein the step of the server determining a respective drop threshold for each region within the filter further comprises the steps of: consolidating queries intersecting each filter region into one quality-of-service (QoS) value region; and formulating the drop threshold determination step into an optimization problem. 8. The method of claim 7 , wherein the step of consolidating queries further comprises the steps of: decomposing the QoS value of each query into one or more additive components, each component per intersected filter region; and for each filter region, consolidating the QoS components of different queries that intersect the region into a single QoS value. 9. The method of claim 7 , wherein the step of formulating the optimization problem further comprises the steps of: setting an objective function to an aggregate of the QoS values of the region; and setting constraints such that the total fraction of a load shed using the drop thresholds equals a pre-calculated throttler value. 10. The method of claim 9 , wherein the throttler value is pre-calculated by the steps of: monitoring a system utilization since a last adaptation step; monitoring a fraction of dropped data items; and using the monitoring steps to compute a fraction of the load that can be handled, wherein the computed fraction is the throttler value. 11. An article of manufacture for use in filtering data in a distributed data stream processing system, comprising computer readable storage medium containing one or more programs which, when executed by a computer, performs the steps of claim 1 . 12. Apparatus for use in filtering data in a distributed data stream processing system, wherein the apparatus receives and processes one or more data streams from one or more data sources, the apparatus comprising: a memory; and a processor coupled to the memory and operative to periodically re-configure one or more filters and send the one or more periodically re-configured filters to the one or more data sources such that the one or more data sources are configured to perform data filtering based on the one or more periodically re-configured filters received from the processor, wherein the operation of periodically re-configuring the one or more filters further comprises: defining a plurality of non-overlapping regions of a data space associated with at least one of the one or more filters; and determining a respective drop threshold for each of the non-overlapping regions within the filter, at least a first region having a different drop threshold than at least a second region, wherein a drop threshold defines a probability that an item will be dropped at a data source when the filter is applied. 13. The apparatus of claim 12 , wherein the operation of the server defining non-overlapping regions of the data space for the filter further comprises partitioning the data space of the filter into a given number of regions. 14. The apparatus of claim 12 , wherein the operation of determining a respective drop threshold for each region within the filter further comprises: consolidating queries intersecting each filter region into one quality-of-service (QoS) value region; and formulating the drop threshold determination step into an optimization problem. 15. A query server, wherein the query server receives and processes one or more data streams from one or more data sources, the query server comprising: a memory; and a processor coupled to the memory and operative to periodically re-configure one or more filters and send the one or more periodically re-configured filters to the one or more data sources such that the one or more data sources are configured to perform data filtering based on the one or more periodically re-configured filters received from the processor, wherein the operation of periodically re-configuring the one or more filters further comprises: defining a plurality of non-overlapping regions of a data space associated with at least one of the one or more filters; and determining a respective drop threshold for each of the non-overlapping regions within the filter, at least a first region having a different drop threshold than at least a second region, wherein a drop threshold defines a probability that an item will be dropped at a data source when the filter is applied. 16. The query server of claim 15 , wherein the operation of the server defining non-overlapping regions of the data space for the filter further comprises partitioning the data space of the filter into a given number of regions. 17. The query server of claim 15 , wherein the operation of determining a respective drop threshold for each region within the filter further comprises: consolidating queries intersecting each filter region into one quality-of-service (QoS) value region; and formulating the drop threshold determination step into an optimization problem. 18. The method of claim 1 , wherein the server decides to re-configure a given one of the one or more filters by evaluating a filtering performance associated with the current configuration of the given filter and by evaluating an unfiltered sample of data received from the corresponding data source. 19. The method of claim 1 , wherein the drop threshold for the first region and the drop threshold for the second region are each greater than zero and less than one. 20. The method of claim 1 , wherein the server periodically re-configuring one or more filters comprises a comparison between a rate at which data items arrive at the server and a rate at which the server processes the data items.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9158837B2 cover?
Techniques are disclosed for adaptive source filtering and load shedding in such data stream processing systems. For example, in one aspect of the invention, a method for use in filtering data in a distributed data stream processing system, wherein a server receives and processes one or more data streams from one or more data sources, comprises the steps of the server periodically re-configurin…
Who is the assignee on this patent?
Gedik Bugra, Wu Kun-Lung, Yu Philip Shi-Lung, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/3331. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 13 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).