Processing data errors for a data processing system

US10387236B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10387236-B2
Application numberUS-201514842891-A
CountryUS
Kind codeB2
Filing dateSep 2, 2015
Priority dateSep 29, 2014
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Processing data errors in a data processing system, includes a computer receiving one or more patterns and a data set. The one or more patterns describe characteristics of an erroneous data record and are associated with a root cause. The root cause includes a description of a technical deficiency causing the data error in the erroneous data record. Responsive to the computer determining that a first set of data records in the received data set have characteristics that match a first pattern of the one or more patterns, the computer assigns the first set of data records of the received data set having characteristics that match the first pattern to a first error group.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing data errors in a data processing system, the method comprising: receiving, by a computer, one or more patterns, wherein the one or more patterns describe characteristics of an erroneous data record and are associated with a root cause, the root cause including a description of a technical deficiency of a hardware component or a software component causing the data error in the erroneous data record; receiving, by the computer, a data set, the data set referring to one or more data tables having columns and rows stored by a single data source system; and responsive to determining, by the computer, that a first set of data records in the received data set have characteristics that match a first pattern of the one or more patterns by determining whether each data record contained within the first set of data records set fulfills conditions described by the first pattern, determining, by the computer, whether all erroneous data records have the same root cause in the single data source system; assigning, by the computer, the first set of data records of the received data set having characteristics that match the first pattern to a first error group representing the first set of data records from the single data source system with the same root cause; and responsive to determining, by the computer, that a second set of data records in the received data set have characteristics that match a second pattern of the one or more patterns, that a percentage of data records of the received data set that do not match the second pattern of the one or more patterns is less than a threshold value, and that the first error group does not include the second set of data records: assigning, by the computer, the second set of data records of the received data set having characteristics that match the second pattern to a second error group, the first error group received from a first data source and the second error group received from the first data source. 2. The method according to claim 1 , wherein determining, by the computer, that the first set of data records in the received data set have characteristics that match the first pattern of the one or more patterns further comprises: determining, by the computer, that a percentage of data records of the received data set that do not match the first pattern of the one or more patterns is less than a threshold value; treating errors in the data records that do not match the first pattern individually; and treating errors in the data records that match the first pattern as a group. 3. The method according to claim 1 , further comprising: responsive to determining, by the computer, that the root cause associated with the first pattern matches the root cause associated with the second pattern: merging, by the computer, the first error group and the second error group allowing fixes for data records of the first error group and the second error group to be the same. 4. The method according to claim 1 , wherein the received data set is processed by one or more processes before being received, wherein the root cause associated with the first pattern causes at least a first failure in a respective at least one Extract Transform and Load (ETL) process of the one or more processes, and wherein the root cause associated with the second pattern causes at least a second failure in a respective at least one process of the one or more processes, further comprises: responsive to determining, by the computer, that the first failure occurs in at least one same process of the one or more processes as the second failure: merging, by the computer, the first error group and the second error group allowing fixes for data records of the first error group and the second error group to be the same. 5. The method according to claim 1 , wherein the received data set is an Extract Transform and Load (ETL) processed data set and wherein the root cause is a failure in a data operation of an ETL process during the extract, transformation, or load process. 6. The method according to claim 1 , further comprising: receiving, by the computer, a plurality of technical support systems, a technical support system providing technical repairs for an erroneous data record, the plurality of technical support systems having one or more predefined technical tasks; selecting, by the computer, a technical support system from among the plurality of technical support systems having the predefined technical task associated with the root cause and with the first pattern, the root cause a technical deficiency of a hardware system that causes one or more data errors in erroneous data records; sending, by the computer, the first error group to the selected technical support system; and receiving, by the computer, the technical repair of the root cause, thereby fixing the erroneous data records of the first error group. 7. The method according to claim 1 , wherein the characteristics of an erroneous data record include a format of the erroneous data record. 8. The method according to claim 1 , wherein the one or more patterns are represented by a decision tree having one or more nodes, the one or more nodes including a decision, wherein the decision tree starts from a pattern having a largest coverage and proceeds to patterns having less coverage, one or more patterns represented in the decision tree considering whether all records have data errors. 9. A computer program product for processing data errors in a data processing system, the computer program product comprising one or more non-transitory computer readable storage medium and program instructions stored on at least one of the one or more computer readable storage medium, the program instructions comprising: program instructions to receive, by a computer, one or more patterns, wherein the one or more patterns describe characteristics of an erroneous data record and are associated with a root cause, the root cause including a description of a technical deficiency of a hardware component or a software component causing the data error in the erroneous data record; program instructions to receive, by the computer, a data set, the data set referring to one or more data tables having columns and rows stored by a single data source system; and responsive to determining, by the computer, that a first set of data records in the received data set have characteristics that match a first pattern of the one or more patterns by determining whether each data record contained within the first set of data records fulfills conditions described by the first pattern, determining, by the computer, whether all erroneous data records have the same root cause in the single data source system; program instructions to assign, by the computer, the first set of data records of the received data set having characteristics that match the first pattern to a first error group representing the first set of data records from the single data source system with the same root cause; and responsive to determining, by the computer, that a second set of data records in the received data set have characteristics that match a second pattern of the one or more patterns, that a percentage of data records of the received data set that do not match the second pattern of the one or more patterns is less than a threshold value, and that the first error group does not include the second set of data records: program instructions to assign, by the computer, the second set of data records of the received data set having characteristics that match the second pattern to a second error group. 10. The computer program product according to claim 9 , further comprising: re

Assignees

Inventors

Classifications

  • the processing taking place on a specific hardware platform or in a specific software environment · CPC title

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Storage of error reports, e.g. persistent data storage, storage using memory protection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10387236B2 cover?
Processing data errors in a data processing system, includes a computer receiving one or more patterns and a data set. The one or more patterns describe characteristics of an erroneous data record and are associated with a root cause. The root cause includes a description of a technical deficiency causing the data error in the erroneous data record. Responsive to the computer determining that a…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/079. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).