Who is the assignee on this patent?

Cigna Intellectual Property Inc

What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

System and method for synthesizing data

US11501205B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11501205-B2
Application number	US-201916536538-A
Country	US
Kind code	B2
Filing date	Aug 9, 2019
Priority date	Dec 12, 2013
Publication date	Nov 15, 2022
Grant date	Nov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for constructing sets of synthetic data. A single data record is identified from a first set of data. The first set of data comprises a first plurality of data records, each of the data records including multiple items of data describing an entity. Using pattern recognition, the single data record is processed to identify a group of records from within the first set that have corresponding characteristics equivalent to the single data record. The identified group of records comprises a target set of variables and the group of records from the first set that are not identified comprises a control set of variables. The target set of variables and the control set of variables are processed, using probability estimation and optimization constraints, to determine a score for each of the records in the first set. The score describes how similar each of the records in the first set is to the single data record. The records associated with a percentage of the highest scores are identified. The data associated with the single data record is replaced with data associated with the identified records identified, item-by-item.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying from a first set of data comprising a first plurality of data records, at least one of the plurality of data records including multiple fields to store a variable describing an entity, a single data record, at least one of the variables being associated with personal information; using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables different from the variables in the single data record; processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for the at least one of the plurality of records in the first set that describes a comparison of the at least one of the plurality of records in the first set to the single data record; identifying the records associated with the score that is above a threshold; and replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified as associated with the score above the threshold field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and building a predictive model using at least the data associated with the records identified as associated with the score that is above the threshold. 2. The computer implemented method of claim 1 , further comprising: receiving an original set of data comprising an original plurality of data records, at least one of the original plurality of data records including multiple fields which store a variable describing an entity; identifying a data record in the original plurality of data records comprising a corresponding variable that is a number of standard deviations from a mean of values for that same variable in the original plurality of data records; removing from the original set of data all records in the identifying the data record step to generate a first set of data records comprising a subset of the original plurality of data records. 3. The computer-implemented method of claim 1 , further comprising: identifying a second single data record from the first set; performing steps of processing the single data record, processing the target set of variables and the control set of variables, identifying the records, and replacing the data on the second single data record. 4. A system comprising: memory operable to store at least one program; at least one processor communicatively coupled to the memory, in which the at least one program, when executed by the at least one processor, causes the at least one processor to perform a method comprising: identifying from a first set of data comprising a first plurality of data records, at least one of the plurality of data records including multiple fields to store a variable describing an entity, a single data record, at least one of the variables being associated with personal information, respectively; using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables different from the variables in the single data record; processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for the at least one of the plurality of records in the first set that describes a comparison of the at least one of the plurality of records in the first set to the single data record; identifying the records associated with a score that is above a threshold; and replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified as associated with the score that is above the threshold field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and building a predictive model based on at least the data associated with the records identified as associated with the score that is above the threshold. 5. The system of claim 4 , the method further comprising: receiving an original set of data comprising an original plurality of data records, at least one of the original plurality of data records including multiple fields each of which stores a variable describing an entity; identifying a data record in the original plurality of data records comprising a corresponding variable that is a number of standard deviations from a mean of values for that same variable in the original plurality of data records; removing from the original set of data all records in the identifying the data record step to generate a first set of data records comprising a subset of the original plurality of data records. 6. The system of claim 4 , the method further comprising: identifying a second single data record from the first set; and performing steps of processing the single data record, processing the target set of variables and the control set of variables, identifying the records, and replacing the data on the second single data record. 7. A non-transitory computer readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, perform a method comprising: identifying from a first set of data comprising a first plurality of data records, at least one of the plurality of data records including multiple fields to store a variable describing an entity, a single data record, at least one of the variables being associated with personal information; using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables different from the variables in the single data record; processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for the at least one of the plurality of data records in the first set that describes a comparison of the at least one of the plurality of data records in the first set to the single data record; identifying the records associated with a score that is above a threshold; and replacing the data that is a representative of the personal information and is associated with the single

Assignees

Cigna Intellectual Property Inc

Inventors

Classifications

G06N20/00Primary
Machine learning · CPC title
G06N5/047
Pattern matching networks; Rete networks · CPC title

Patent family

Related publications grouped by family.

View patent family 67988609

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501205B2 cover?: Systems and methods for constructing sets of synthetic data. A single data record is identified from a first set of data. The first set of data comprises a first plurality of data records, each of the data records including multiple items of data describing an entity. Using pattern recognition, the single data record is processed to identify a group of records from within the first set that hav…
Who is the assignee on this patent?: Cigna Intellectual Property Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).