Method and system for generating a unified database from data sets

US10025828B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10025828-B2
Application numberUS-201414543414-A
CountryUS
Kind codeB2
Filing dateNov 17, 2014
Priority dateAug 23, 2013
Publication dateJul 17, 2018
Grant dateJul 17, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for generating a unified database includes receiving a structured set of data, where each set is made up of records having fields, aggregating values within a first field of the records, automatically applying a set of rules to the first field values to determine correlations among the first field values, calculating a confidence level regarding a label for the first field, providing the label to the first field, storing the first field values in the first field in the unified database, and receiving more information to increase the confidence level. A system for generating a clinical database and a method for using the database are also described.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for generating a unified clinical database, comprising: receiving a structured set of clinical data, the set comprising one or more records, each record having two or more fields; aggregating values taken from a first field of the records; aggregating values taken from a second field of the records; automatically applying, by a processor, a set of rules to the aggregated first and second field values to calculate statistical correlations between the aggregated first and second field values; calculating a confidence level regarding a label for the first field based on the rules; if the confidence level meets or exceeds a pre-determined threshold, applying the label to the first field; storing the first field values in a first unified field in the unified database; and if the confidence level does not meet or exceed the pre-determined threshold, receiving information regarding the correlations between the aggregated first and second field values and recalculating the confidence level. 2. The method of claim 1 , further comprising: calculating a second confidence level regarding a label for the second field; if the second confidence level meets or exceeds a pre-determined threshold, applying the label to the second field; and storing the second field values in a second unified field in the unified database; and if the second confidence level does not meet or exceed the pre-determined threshold, receiving information regarding the correlations between the aggregated first and second field values and recalculating the second confidence level. 3. The method of claim 1 , further comprising: calculating statistical distributions of the aggregated first and second field values; and comparing the statistical distributions of the aggregated first and second field values with statistical distributions of stored data measures. 4. The method of claim 3 , wherein determining labels for the aggregated first and second field values comprises determining closest data measures having statistical distributions substantially the same as the statistical distributions of the aggregated first and second field values. 5. The method of claim 1 , wherein applying the set of rules to the aggregated first and second field values comprises determining a structure of the records. 6. The method of claim 5 , wherein the structure of the records of a CHEM-7 test comprises seven fields. 7. A computer-implemented method for generating a unified clinical database, comprising: receiving a structured set of clinical data, the set comprising one or more records, each record having one or more fields; aggregating values taken from a first field of the records; automatically applying, by a processor, one or more rules to the aggregated first field values, wherein applying said rules includes comparing the aggregated first field values to: stored statistics of known clinical measures, if the aggregated first field values comprise numerical values; known text entries from a clinical dictionary, if the aggregated first field values comprise alphabetical and/or alphanumeric values; stored calendar information, if the aggregated first field values comprise date values; and stored information concerning a clinical trial, including trial design information, if the aggregated first field values comprise alphabetical and/or alphanumeric values; calculating a confidence level regarding a label for the first field; if the confidence level meets or exceeds a pre-determined threshold, applying the label to the first field; and storing the first field values in a first unified field in the unified database; and if the confidence level does not meet or exceed the pre-determined threshold, receiving more information and recalculating the confidence level. 8. The method of claim 7 , wherein the rules include eligibility criteria comprising inclusion criteria, exclusion criteria, or both. 9. The method of claim 7 , wherein the rules are refined based on the received data. 10. The method of claim 7 , wherein the stored statistics comprise statistical distributions of known clinical measures. 11. The method of claim 7 , wherein if the aggregated first field values comprise date values, and the date values are after the beginning of a clinical trial, then the first field label refers to testing during the clinical trial. 12. The method of claim 7 , wherein if the aggregated first field values comprise date values, and the date values are before the beginning of a clinical trial, then the first field label refers to historic data. 13. The method of claim 7 , further comprising: aggregating values taken from a second field of the records; automatically applying the one or more rules to the aggregated second field values to determine correlations between the aggregated first and second field values; and determining a label for the second field values. 14. The method of claim 13 , wherein said correlations increase the confidence level regarding the label for the first field values. 15. A computer-implemented method for generating a unified database, comprising: receiving a structured set of clinical data, the set comprising one or more records, each record having one or more fields; aggregating values within a first field of the records; automatically applying, by a processor, a set of rules to the first field values to calculate statistical correlations among the first field values, wherein applying the set of rules includes comparing the aggregated first field values to stored statistics of known data measures, if the aggregated first field values comprise numerical values; calculating a confidence level regarding a label for the first field; if the confidence level meets or exceeds a pre-determined threshold, applying the label to the first field; storing the first field values in the first field in the unified database; and receiving more information to increase the confidence level; and if the confidence level does not meet or exceed the pre-determined threshold, receiving more information and recalculating the confidence level. 16. The method of claim 15 , wherein applying the set of rules to the first field values comprises using inclusion and exclusion criteria. 17. The method of claim 15 , further comprising: calculating a statistical distribution of the first field values; and comparing the statistical distribution of the first field values with statistical distributions of stored data measures. 18. The method of claim 17 , further comprising: determining a closest data measure having a statistical distribution substantially the same as the statistical distribution of the first field values. 19. The method of claim 17 , further comprising calculating the statistical distributions of the stored data measures. 20. The method of claim 15 , wherein applying the set of rules includes comparing the first field values to known text entries from a clinical dictionary, if the aggregated first field values comprise alphabetical and/or alphanumeric values. 21. The method of claim 15 , further comprising: aggregating values within a second field of the records; automatically applying the set of rules to the second field values to determine correlations among the second field values and first field values; and determining a label for the second field values. 22. The method of claim 21 , wherein said determine

Assignees

Inventors

Classifications

  • for electronic clinical trials or questionnaires · CPC title

  • for patient-specific data, e.g. for electronic patient records · CPC title

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

  • for data related to laboratory analysis, e.g. patient specimen analysis · CPC title

  • File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10025828B2 cover?
A method for generating a unified database includes receiving a structured set of data, where each set is made up of records having fields, aggregating values within a first field of the records, automatically applying a set of rules to the first field values to determine correlations among the first field values, calculating a confidence level regarding a label for the first field, providing t…
Who is the assignee on this patent?
Medidata Solutions Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 17 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).