Code analysis for providing data privacy in ETL systems

US9716704B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9716704-B2
Application numberUS-201615054672-A
CountryUS
Kind codeB2
Filing dateFeb 26, 2016
Priority dateFeb 19, 2015
Publication dateJul 25, 2017
Grant dateJul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for providing data privacy in an information integration system, the method performing during compilation of an information integration job the steps of: receiving information regarding a data flow structure of an information integration job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the at least one source system; determining a set of data exit points at which the output data are provided to the one or more target entities; determining at least one non-trusted target entity of the one or more target entities; determining, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information; and if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, modify the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information. 2. The method according to claim 1 , wherein the at least one non-trusted target entity is determined based on metadata correlated with the one or more target entities. 3. The method according to claim 1 , wherein the at least one non-trusted target entity is determined based on identification data of a user authenticated at one or more of: the information integration system and said target entity. 4. The method according to claim 1 , wherein each field of data provided by the at least one source system is analyzed regarding sensitive information in order to determine whether output data derived from said field of data is classified as sensitive information. 5. The method according to claim 1 , wherein classifying a field of output data provided by the at least one source system as sensitive information is done based on analysis of metadata provided in association with the field of output data. 6. The method according to claim 1 , wherein classifying a field of output data provided by the at least one source system as sensitive information is done based on one or more of: text analytics and data classification algorithms marking a field of data as sensitive information based on data classification. 7. The method according to claim 1 , wherein a field of output data is derived based on a combination of at least two fields of data and the field of output data resulting from said combination is identified as sensitive depending on the combined fields of data. 8. The method according to claim 1 , wherein a field of data provided by an operator is identified as sensitive information based on at least one of: a type of operator and a functionality of the operator. 9. The method according to claim 1 , wherein after determining a field of data is classified as sensitive information, analyzing the information integration job in order to determine at least one further field of data comprising identical data, and marking said further field of data as sensitive information. 10. The method according to claim 9 , wherein a target entity to which said further field of data is provided is classified as a non-trusted target entity based on at least one of: metadata provided to the target entity and identification data of a user authenticated at the information integration system. 11. The method according to claim 1 , further comprising parameterizing a target entity information associated with a target entity, determining, at compile time, sensitive data is received by the target entity, and disabling the parameterization. 12. The method according to claim 1 , wherein if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, tagging the information integration job in order to disable execution of said information integration job. 13. The method according to claim 1 , wherein the masking operator is configured to mask the sensitive information by at least one of: removing said sensitive information and replacing said sensitive information by a dummy value.

Assignees

Inventors

Classifications

  • Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • wherein the data content is protected, e.g. by encrypting or encapsulating the payload · CPC title

  • Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII] · CPC title

  • H04L63/08Primary

    for authentication of entities (cryptographic mechanisms or cryptographic arrangements for entity authentication H04L9/32) · CPC title

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9716704B2 cover?
In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L63/08. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).