Detecting loss of data received at a data processing pipeline of an online system during a specified time interval

US2018069823A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018069823-A1
Application numberUS-201615258966-A
CountryUS
Kind codeA1
Filing dateSep 7, 2016
Priority dateSep 7, 2016
Publication dateMar 8, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An online system receives data and processes the data in a data processing pipeline. To data loss in the data processing pipeline, the online system determines a time interval during which each item of data is received and associates a set of counters with each time interval. For each time interval, an input counter is incremented for each data item received during the time interval and an output counter is incremented for each data item received during the time interval that was processed by the data processing pipeline. The online system compares an input number from the input counter and an output number from the output counter for each time interval. Based on a difference between the input number and output number for a time interval, the online system determines if a loss of data received during the time interval occurred. Lost Data are identified and processed.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, at an input of a data processing pipeline of an online system, a plurality of data items describing one or more events, the data processing pipeline comprising a plurality of data processing modules; determining, from a plurality of consecutive time intervals, a time interval in which each data item of the plurality of data items was received; for each data item received during a particular time interval of the plurality of time intervals, incrementing an input counter associated with the particular time interval; determining, based on the input counter, an input number of data items that were received at the input of the data processing pipeline during the particular time interval; during the time interval and a predetermined period following the particular time interval, incrementing an output counter for each data item received during the particular time interval that was processed by every data processing module of the data processing pipeline; determining, based on the output counter, an output number of data items that were received at the input of the data processing pipeline during the particular time interval and that were processed by every data processing module of the data processing pipeline; comparing the input number of data items to the output number of data items for the particular time interval; and determining whether a data item received during the particular time interval was lost in the data processing pipeline based at least in part on the comparing. 2 . The method of claim 1 , wherein each of the plurality of data items comprises information describing a time at which the data item was received at the input of the data processing pipeline. 3 . The method of claim 2 , further comprising determining whether a data item was received during the particular time interval based at least in part on the information describing the time at which the data item was received at the input of the data processing pipeline. 4 . The method of claim 1 , further comprising: retrieving stored input information describing each data item received during the particular time interval responsive to determining the data item received during the particular time interval was lost in the data processing pipeline; retrieving stored output information describing each data item received during the particular time interval that was processed by every data processing module of the data processing pipeline; comparing the stored input information to the stored output information; and identifying a data item received during the particular time interval that was not processed by every data processing module of the data processing pipeline based at least in part on the comparing. 5 . The method of claim 4 , further comprising: sending the identified data item received during the particular time interval that was not processed by every data processing module of the data processing pipeline to one or more of the plurality of data processing modules of the data processing pipeline; and processing the identified data item received during the particular time interval that was not processed by every data processing module of the data processing pipeline by the one or more data processing modules. 6 . The method of claim 1 , wherein a length of the predetermined period following the particular time interval is based at least in part on an expected length of time for every data processing module of the data processing pipeline to process the data item. 7 . The method of claim 1 , wherein each of the plurality of data processing modules is associated with one or more selected from a group consisting of: a module input counter storing a number of data items received at the input of the data processing pipeline during the particular time interval and that were received by the data processing module, a module output counter storing a number of data items received at the input of the data processing pipeline during the particular time interval that were processed by the data processing module, and any combination thereof. 8 . The method of claim 7 , further comprising: for each data processing module, determining a number of data items received during the particular time interval that were processed by the data processing module based on the module input counter associated with the data processing module and the module output counter associated with the data processing module; comparing the number of data items received during the particular time interval that were processed by each data processing module of the data processing pipeline; and determining whether the data item received during the particular time interval was not processed by a data processing module based at least in part on the comparing. 9 . The method of claim 8 , further comprising: responsive to determining the data item received during the time interval was not processed by a data processing module, identifying the data processing module that did not process the data item based on a difference between an input counter of the identified data processing module and an input counter of an additional data processing module. 10 . The method of claim 1 , wherein the one or more events comprise interactions between a user of the online system and a content item presented to the user by the online system. 11 . The method of claim 10 , wherein the one or more events are associated with one or more selected from a group consisting of: accessing the content item by the user, presenting the content item in a display area of a client device associated with the user, purchasing a product associated with the content item by the user, and purchasing a service associated with the content item by the user. 12 . A method comprising: receiving, at an input of a data processing pipeline of an online system, an unordered stream of data comprising a plurality of data items describing one or more events; associating a time-stamp with each data item of the plurality of data items, the time-stamp associated with a data item describing a time when data item was received at the input of the data processing pipeline; determining a time interval in which each data item of the plurality of data items was received from a plurality of time intervals based at least in part on the time-stamp associated with each data item; for each data item received during a particular time interval of the plurality of time intervals, incrementing an input counter associated with the particular time interval; determining an input number of data items received during the particular time interval based part on a number of times the input counter was incremented; during a predetermined period of time, incrementing an output counter for each data item received during the particular time interval that was processed by the data processing pipeline; determining an output number of data items received during the particular time interval that were processed by the data processing pipeline based at least in part on a number of times the output counter was incremented; computing a difference between the input number of data items and the output number of data items; and determining a number of data items received during the particular time interval that were not processed by the data processing pipeline based on the difference between the input number of data items and the output number of data items. 13 . The method of claim 12 , further comprising: retrieving stored input information describing each data item received during the particular time interval; retrieving

Assignees

Inventors

Classifications

  • Online advertisement · CPC title

  • Peer-to-peer [P2P] networks · CPC title

  • H04L51/32Primary

    Electricity · mapped topic

  • using time related information in packets, e.g. by adding timestamps · CPC title

  • User profiles · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018069823A1 cover?
An online system receives data and processes the data in a data processing pipeline. To data loss in the data processing pipeline, the online system determines a time interval during which each item of data is received and associates a set of counters with each time interval. For each time interval, an input counter is incremented for each data item received during the time interval and an outp…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0277. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).