Pluggable fault detection tests for data pipelines

US10936479B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10936479-B2
Application numberUS-201916572404-A
CountryUS
Kind codeB2
Filing dateSep 16, 2019
Priority dateSep 14, 2015
Publication dateMar 2, 2021
Grant dateMar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for detecting faults related to a data pipeline system, the method comprising: at one or more computing devices comprising one or more processors and memory storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising: receiving a plugin comprising one or more instructions representing a test to perform on the data pipeline system and one or more configuration points; wherein the data pipeline system is configured to receive source data from one or more data sources and configured to apply one or more transformations to the source data to produce transformed data before storage of the transformed data in one or more data sinks; receiving test data from the data pipeline system; wherein the test data received from the data pipeline system comprises a metric reflecting an amount of the transformed data after the one or more transformations; determining to run the test defined by the plugin on the data pipeline system including executing the one or more instructions of the plugin based on the test data, wherein a result of executing the one or more instructions includes at least a test result status indicator; and causing display of a graphical user interface that presents at least the test result status indicator. 2. A fault detection system for detecting faults related to a data pipeline system, the fault detection system comprising: storage media; one or more processors; and one or more programs stored in the storage media and configured for execution by the one or more processors, the one or more programs comprising instructions for: receiving a) one or more instructions representing a test to perform on data processed by the data pipeline system and b) one or more configuration points, wherein the data pipeline system is configured to receive source data from one or more data sources and configured to apply one or more transformations to the source data to produce transformed data before storage of the transformed data in one or more data sinks; and receiving test data from the data pipeline system; determining to run the test on the data pipeline system including executing the one or more instructions based on one or more settings for the one or more configuration points and the test data, wherein a result of executing the one or more instructions includes at least a test result status indicator; wherein the test result status indicator is based, at least in part, on the result of executing the one or more instructions including determining: (a) whether a sample contains a correct number of columns according to a schema for the transformed data, (b) whether data in each column of the sample adheres to a data type of the column as specified in a schema for the transformed data, (c) whether data in each column of the sample improperly contains NULL values according to a schema for the transformed data, or any combination of (a), (b), or (c); and causing display of a graphical user interface that visibly presents at least the test result status indicator. 3. The fault detection system of claim 2 , wherein determining to run the test is performed based on a configuration point of the one or more configuration points that defines a time interval for periodically executing the test. 4. The fault detection system of claim 2 , wherein the test is performed by training a classifier based on a historical sample of the transformed data and, after the classifier has been trained, using the classifier to predict a test result status indicator based on the test data. 5. The fault detection system of claim 4 , wherein the classifier is implemented using an artificial neural network. 6. The fault detection system of claim 2 , wherein the test result status indicator is one of a plurality of test result status indicators that include at least a test result status representing that a fault occurred with the data pipeline system, a test result status representing that a fault has potentially occurred with the data pipeline system, and a test result status representing that no fault has occurred with the data pipeline system. 7. The fault detection system of claim 2 , wherein the data pipeline system includes a plurality of pipelines and the graphical user interface displays a plurality of test result status indicators, each test result status indicator of the plurality of test result status indicators relating to a plurality of tests performed on a particular pipeline during a particular time period. 8. The fault detection system of claim 7 , wherein each test result status indicator of the plurality of test result status indicators is generated by using a worst test result status indicator among test result status indicators for the plurality of tests performed on the particular pipeline during the particular time period. 9. The fault detection system of claim 8 , wherein each particular test result status indicator of the plurality of test result status indicators is displayed as or in relation to a widget which, when selected, causes display of a third graphical user interface that presents the plurality of tests for the particular pipeline during the particular time period. 10. The fault detection system of claim 9 , wherein each particular test of the plurality of tests is displayed in the third graphical user interface as or in relation to a widget which, when selected, causes display of a fourth graphical user interface that presents detailed information for the particular test. 11. The fault detection system of claim 10 , wherein the detailed information for the particular test is displayed in relation to a widget which, when selected, causes a test result status indicator of the particular test to be treated as though no fault was detected. 12. The fault detection system of claim 2 , wherein the one or more configuration points include one or more of: variables referenced by the one or more instructions or functions referenced by the one or more instructions. 13. The fault detection system of claim 2 , wherein the one or more instructions perform the test by inspecting log data generated by the data pipeline system for one or more results of the data pipeline system executing one or more checks for faults involving the data pipeline system. 14. The fault detection system of claim 2 , wherein the graphical user interface is displayed via a client application. 15. The fault detection system of claim 14 , wherein the fault detection system receives the one or more instructions via the client application. 16. The fault detection system of claim 2 , wherein the test data comprises a sample of the source data before the one or more transformations. 17. The fault detection system of claim 16 , wherein the one or more configuration points specify collection of the sample of the source data from the one or more data sources. 18. The fault detection system of claim 16 , wherein the test result status indicator is based, at least in part, on the result of executing the one or more instructions including determining: (a) whether the sample of the source data contains a correct number of columns according to a schema for the source data, (b) whether data in each column of the sample of the source data adheres to a data type of the column as specified in a schema for the source data, (c) whether data in each column of the sample of the source data improperly contains NULL values according to a schema for the source data, or any combination of (a), (b)

Assignees

Inventors

Classifications

  • for test design, e.g. generating new test cases · CPC title

  • for test execution, e.g. scheduling of test suites · CPC title

  • Content or structure details of the error report, e.g. specific table structure, specific error fields · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • for test results analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10936479B2 cover?
Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in add…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/0775. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).