Incremental parallel processing of data

US9268597B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9268597-B2
Application numberUS-201414231983-A
CountryUS
Kind codeB2
Filing dateApr 1, 2014
Priority dateApr 1, 2014
Publication dateFeb 23, 2016
Grant dateFeb 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to produce a first output data set; after executing the pipeline application on the first input data set, identifying a second input data set on which to execute the pipeline application; determining a set of differences between the first input data set and second input data set; and executing the pipeline application on the set of differences to produce a second output data set.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method executed by one or more processors, the method comprising: identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects, the pipeline application configured to be executed in parallel across a plurality of computing devices, each of the one or more pipeline objects configured to receive an input data set and produce an output data set; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to produce a first output data set, the executing the pipeline application including executing each of the one or more pipeline objects in an order in which a previous pipeline object provides its output data set to a next pipeline object as its input data set; after executing the pipeline application on the first input data set, identifying a second input data set on which to execute the pipeline application; determining a set of differences between the first input data set and second input data set; and executing the pipeline application on the set of differences to produce a second output data set, the executing the pipeline application on the set of differences including executing each of the one or more pipeline objects includes each previous pipeline object in the order providing differences from its previous output data set to the next pipeline object as its input data set, and the second output data set including differences from the first output data set. 2. The method of claim 1 , further comprising determining a pipeline state in response to executing the pipeline on the first input data set, the pipeline state including a representation of the first input data set and the first output data set. 3. The method of claim 2 , further comprising updating the pipeline state in response to executing the pipeline on the set of differences from the first input data set to generate an updated pipeline state, the updated pipeline state including a representation of the second input data set and the second output data set. 4. The method of claim 1 , further comprising determining a pipeline object state for each of the one or more pipeline objects in response to executing the pipeline on the first input data set, the pipeline object state including a representation of the input data set and the output data set for the pipeline object. 5. The method of claim 4 , further comprising updating the pipeline object state in response to executing the pipeline on the set of differences from the first input data set to generate an updated pipeline object state, the updated pipeline object state including differences from the input data set and the output data set for the pipeline object. 6. The method of claim 1 , wherein identifying the first input data set on which to execute the pipeline comprises: transforming the first input data set into a first set of key value pairs; and storing the first set of key value pairs in a key value store. 7. The method of claim 1 , wherein determining the set of differences between the first input data set and second input data set comprises: transforming the second input data set into a second set of key value pairs; comparing the second set of key value pairs to first set of key value pairs; and identifying key value pairs that have been added or deleted from the second set of key value pairs relative to the first set of key value pairs. 8. The method of claim 1 , wherein determining the set of differences between the first input data set and second input data set comprises: determining a last execution timestamp for the pipeline representing a time at which the pipeline was executed on the first input data set; and identifying a set of items in the second input data set including timestamps after the last execution timestamp. 9. A non-transitory, computer-readable medium storing instructions operable when executed to cause at least one processor to perform operations comprising: identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects, the pipeline application configured to be executed in parallel across a plurality of computing devices, each of the one or more pipeline objects configured to receive an input data set and produce an output data set; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to produce a first output data set, the executing the pipeline application including executing each of the one or more pipeline objects in an order in which a previous pipeline object provides its output data set to a next pipeline object as its input data set; after executing the pipeline application on the first input data set, identifying a second input data set on which to execute the pipeline application; determining a set of differences between the first input data set and second input data set; and executing the pipeline application on the set of differences to produce a second output data set, the executing the pipeline application on the set of differences including executing each of the one or more pipeline objects includes each previous pipeline object in the order providing differences from its previous output data set to the next pipeline object as its input data set, and the second output data set including differences from the first output data set. 10. The computer-readable medium of claim 9 , the operations further comprising determining a pipeline state in response to executing the pipeline on the first input data set, the pipeline state including a representation of the first input data set and the first output data set. 11. The computer-readable medium of claim 10 , the operations further comprising updating the pipeline state in response to executing the pipeline on the set of differences from the first input data set to generate an updated pipeline state, the updated pipeline state including a representation of the second input data set and the second output data set. 12. The computer-readable medium of claim 9 , the operations further comprising determining a pipeline object state for each of the one or more pipeline objects in response to executing the pipeline on the first input data set, the pipeline object state including a representation of the input data set and the output data set for the pipeline object. 13. The computer-readable medium of claim 12 , the operations further comprising updating the pipeline object state in response to executing the pipeline on the set of differences from the first input data set to generate an updated pipeline object state, the updated pipeline object state including differences from the input data set and the output data set for the pipeline object. 14. The computer-readable medium of claim 9 , wherein identifying the first input data set on which to execute the pipeline comprises: transforming the first input data set into a first set of key value pairs; and storing the first set of key value pairs in a key value store. 15. The computer-readable medium of claim 9 , wherein determining the set of differences between the first input data set and second input data set comprises: transforming the second input data set into a second set of key value pairs; comparing the second set of k

Assignees

Inventors

Classifications

  • Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

  • Update request formulation · CPC title

  • using instruction pipelines · CPC title

  • where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9268597B2 cover?
One example method includes identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to pro…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/453. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).