Systems and methods for configuring data stream filtering

US11947545B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11947545-B2
Application numberUS-202217685223-A
CountryUS
Kind codeB2
Filing dateMar 2, 2022
Priority dateMar 2, 2022
Publication dateApr 2, 2024
Grant dateApr 2, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for configuring data stream filtering are disclosed. In one embodiment, a method for data stream processing comprises receiving an incoming dataset stream at a data stream processing environment, wherein the dataset stream comprises a data stream; configuring with a streaming data filter configuration tool, one or more filter parameters for a data filter that receives the data stream; computing with the streaming data filter configuration tool, one or more filter statistics estimates based on the filter parameters, wherein the filter statistics estimates are computed from sample elements of a representative sample of the data stream retrieved from a representative sample data store; outputting to a workstation user interface the filter statistics estimates; and configuring the data filter to apply the filter parameters to the data stream in response to an instruction from the workstation user interface.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for data stream processing, the method comprising: receiving, at a data stream processing environment, a data stream from a streaming data source; generating a representative sample of the data stream, the representative sample including: a first set of sample elements comprising real-time sample elements directly sampled from the data stream by a data stream sampling mechanism executed by the data stream processing environment; and a second set of sample elements comprising previously stored sample elements of the data stream, sampled from a data lake data store by a bootstrap data set sampling mechanism executed by the data stream processing environment; receiving a selection of one or more filter parameters for a data filter that receives the data stream; computing one or more filter statistics estimates that represent applying the one or more filter parameters to the data stream, based on applying the one or more filter parameters to the representative sample; outputting to a workstation user interface the one or more filter statistics estimates; and in response to an instruction from the workstation indicating an approval of the one or more filter estimates, filtering the data stream using the data filter by configuring the data filter to apply the one or more filter parameters to produce a filtered data stream output. 2. The method of claim 1 , further comprising: maintaining a freshness of the representative sample by applying a time-weighted algorithm to determine which sample elements are maintained in the representative sample. 3. The method of claim 2 , further comprising: applying a timestamp to each of the sample elements of the representative sample indicating when each respective sample element was sampled; maintaining sample elements of the representative sample having the timestamp after a threshold time are maintained in the representative sample; and removing sample elements of the representative sample having the timestamp prior to the threshold time based on either a probability parameter or a proportion parameter. 4. The method of claim 1 , further comprising: generating an initial representative sample of the data stream by systematically sampling the data stream, and by systematically sampling records of the data stream retrieved from the data lake data store; and storing the initial representative sample of the data stream to a representative sample data store. 5. The method of claim 1 , wherein the one or more filter parameters are configured based on inputs received from the workstation user interface, the method further comprising: outputting one or more filter parameter suggestions to the workstation user interface, wherein the one or more filter parameter suggestions are generated based on the one or more filter parameters. 6. The method of claim 5 , further comprising: generating the one or more filter parameter suggestions based on identifying values in the representative sample that are semantically similar to the one or more filter parameters. 7. The method of claim 5 , further comprising: generating the one or more filter parameter suggestions based on correlating the one or more filter parameters to a log of historical filter sets. 8. A data stream processing environment, the data stream processing environment comprising: a data lake data store storing records of a data stream; a data filter configured to receive the data stream, and filter the data stream to produce a filtered output; a streaming data filter configuration tool coupled to the data lake data store and to the data filter, wherein the streaming data filter configuration tool is configured to: generate a representative sample of the data stream, the representative sample including: a first set of sample elements comprising real-time sample elements directly sampled from the data stream; and a second set of sample elements comprising previously stored sample elements of the data stream, sampled from the data lake data store; input one or more filter parameters for the data filter from a workstation user interface; compute one or more filter statistics estimates that represent applying the one or more filter parameters to the data stream, based on applying the one or more filter parameters to the representative sample; output to the workstation user interface the one or more filter statistics estimates for the one or more filter parameters; and configure the data filter to apply the one or more filter parameters to the data stream in response to an instruction from the workstation user interface. 9. The data stream processing environment of claim 8 , further comprising: one or more controllers programmed to execute code to implement at least one of the streaming data filter configuration tool, the data filter, and the data lake data store. 10. The data stream processing environment of claim 8 , further comprising: a sample update mechanism configured to maintaining a freshness of the representative sample, wherein the sample update mechanism applies a time-weighted algorithm to determine which sample elements are maintained in the representative sample. 11. The data stream processing environment of claim 10 , wherein the sample update mechanism is configured to: apply a timestamp to each of the sample elements of the representative sample indicating when each respective sample element was sampled; wherein sample elements of the representative sample, having the timestamp after a threshold time, are maintained in the representative sample; and wherein sample elements of the representative sample, having the timestamp prior to the threshold time, are removed from the representative sample based on either a probability parameter or a proportion parameter. 12. The data stream processing environment of claim 10 , wherein the sample update mechanism is configured to: generate an initial representative sample of the data stream by systematically sampling the data stream, and by systematically sampling records of the data stream retrieved from the data lake data store; and store the initial representative sample of the data stream to a representative sample data store. 13. The data stream processing environment of claim 8 , further comprising a filter recommendation mechanism configured to generate one or more filter parameter suggestions based on the one or more filter parameters; wherein the one or more filter parameters are configured based on inputs received from the workstation user interface; and wherein the one or more filter parameter suggestions are output to the workstation user interface. 14. The data stream processing environment of claim 13 , wherein the filter recommendation mechanism generates the one or more filter parameter suggestions based on identifying values in the representative sample that are semantically similar to the one or more filter parameters. 15. The data stream processing environment of claim 13 , wherein the filter recommendation mechanism generates the one or more filter parameter suggestions based on a correlation of the one or more filter parameters to a log of historical filter sets. 16. The data stream processing environment of claim 8 , wherein an output of the data filter is coupled to a profile data store. 17. The data stream processing environment of claim 8 , wherein the one or more filter statistics estimates include at least one of a proportion of filtered records and a filtered field frequency distribution. 18. One or more non-trans

Assignees

Inventors

Classifications

  • Data stream processing; Continuous queries · CPC title

  • Filtering based on additional data, e.g. user or group profiles · CPC title

  • Temporal data queries · CPC title

  • Approximate or statistical queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11947545B2 cover?
Systems and methods for configuring data stream filtering are disclosed. In one embodiment, a method for data stream processing comprises receiving an incoming dataset stream at a data stream processing environment, wherein the dataset stream comprises a data stream; configuring with a streaming data filter configuration tool, one or more filter parameters for a data filter that receives the da…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24568. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).