Feature engineering method, apparatus, and system

US11250951B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11250951-B2
Application numberUS-201815942223-A
CountryUS
Kind codeB2
Filing dateMar 30, 2018
Priority dateMar 30, 2018
Publication dateFeb 15, 2022
Grant dateFeb 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example implementations described herein are directed to systems and methods for feature preparation that receives patient feature data and determines similarity of pre-stored models with the patient feature data. In an example implementation, a database of the pre-stored models is analyzed to assess similarity indicating that feature preparation of the pre-stored models is compatible with the patient feature data. For similarity indicative of feature preparation to be utilized, the feature preparation is conducted for the patient feature data based on the pre-stored model determined to be similar. The feature preparation retrieves reusable features associate with the similar pre-stored model, where the reusable features comprise pre-calculated features of the model. A machine learning model is generated using results of the feature preparation and patient feature data; and a prediction is provided using the machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving patient feature data; determining similarity of pre-stored models with the patient feature data, wherein a database of the pre-stored models is analyzed to assess similarity indicating that feature preparation of the pre-stored models is compatible with the patient feature data; for the determination of the similarity indicative of the feature preparation being compatible with the patient feature data: conducting the feature preparation for the patient feature data based on the pre-stored model determined to be similar, wherein the feature preparation retrieves reusable features associated with the similar pre-stored model, where the reusable features comprise pre-calculated features of the model and are identified from using a patient identifier to which different types of structured and unstructured data are associated; tuning a machine learning model configured to output a prediction probability of a patient condition determine readmission probability using a combination of the pre-stored model determined to be similar and results of the feature preparation and patient feature data; wherein determining similarity of pre-stored models further comprises: searching the database of reusable models and features based on keys of data sources and feature metadata; outputting a similar model list with the pre-stored models based on the search results; and in response to determining that a pre-stored model from the similar model list with the maximum similarity satisfies a threshold, returning the similar model list; wherein returning the similar model list in response to determining that the pre-stored model from the similar model list with the maximum similarity satisfies the threshold further comprises: tuning the similar model to remove reusable features that fail to satisfy a minimum population criteria and a sample data distribution criteria; wherein determining similarity of pre-stored models further comprises: in response to determining that the pre-stored models from the similar model list fail to satisfy the threshold, recommending model options for the user to select, and return a model selection based on a user selected model option. 2. The method of claim 1 , wherein the prediction probability is for a future patient condition that is used to form a patient treatment plan. 3. The method of claim 1 , further comprising providing a data lineage that identifies one or more data sources of data used for the pre-stored model and machine learning model. 4. The method of claim 1 , further comprising a user interface to provide a dataset for tests associated with the machine learning model, wherein the dataset comprises at least one of a patient dataset, a medical dataset, a lab dataset, and a doctor's note dataset. 5. The method of claim 1 , further comprising creating non-pre-calculated features for other patient feature data determined not to be similar with pre-stored models; and joining the pre-calculated features and the created non-pre-calculated features with the patient identifier. 6. The method of claim 1 , further comprising outputting a similar model list comprising a model name, a model identifier, a reusable data source, a reusable features, a reusable features path, and a similarity score. 7. The method of claim 1 , further comprising outputting a similar model list derived from pre-stored models, wherein the similar model list comprises a user selection for training models. 8. A system comprising: a memory; a processor coupled to the memory configured to: receive patient feature data; determine similarity of pre-stored models with the patient feature data, wherein a database of the pre-stored models is to be analyzed to assess similarity indicating that feature preparation of the pre-stored models is compatible with the patient feature data; for the determination of the similarity indicative of the feature preparation being compatible with the patient feature data: conduct the feature preparation for the patient feature data based on the pre-stored model determined to be similar, wherein the feature preparation retrieves reusable features associated with the similar pre-stored model, where the reusable features comprise pre-calculated features of the model and are identified from a patient identifier to which structured and unstructured data are associated; tune a machine learning model configured to determine readmission probability using a combination of the pre-stored model determined to be similar and results of the feature preparation and patient feature data; wherein to determine similarity of pre-stored models further comprises: search the database of reusable models and features based on keys of data sources and feature metadata; output a similar model list with the pre-stored models based on the search results; and in response to determining that a pre-stored model from the similar model list with the maximum similarity satisfies a threshold, return the similar model list; wherein to return the similar model list in response to determining that the pre-stored model from the similar model list with the maximum similarity satisfies the threshold further comprises: tuning the similar model to remove reusable features that fail to satisfy a minimum population criteria and a sample data distribution criteria; wherein to return the similar model list in response to determining that the pre-stored model from the similar model list with the maximum similarity satisfies the threshold further comprises: tuning the similar model to remove reusable features that fail to satisfy a minimum population criteria and a sample data distribution criteria; wherein to determine similarity of pre-stored models further comprises: in response to determining that the pre-stored models from the similar model list fail to satisfy the threshold, recommending model options for the user to select, and return a model selection based on a user selected model option. 9. The system of claim 8 , further configured to create non-pre-calculated features for other patient feature data determined not to be similar with pre-stored models; and join the pre-calculated features and the created non-pre-calculated features with the patient identifier. 10. The system of claim 8 , wherein to provide the prediction probability includes a user interface to display data lineage that identifies one or more data sources of data used for the pre-stored model and machine learning model. 11. The system of claim 8 , wherein the data lineage is based on model metadata of the pre-stored models. 12. A non-transitory computer-readable medium storing instructions for a model management system including a processing device configured to: receive patient feature data; determine similarity of pre-stored models with the patient feature data, wherein a database of the pre-stored models is to be analyzed to assess similarity indicating that feature preparation of the pre-stored models is compatible with the patient feature data; for the determination of the similarity indicative of the feature preparation being compatible with the patient feature data: conduct the feature preparation for the patient feature data based on the pre-stored model determined to be similar, wherein the feature preparation retrieves reusable features associated with the similar pre-stored model that are used to train parameters of the similar pre-stored model, where the reusable features comprise pre-calculated features of the model and are identified from a patient identifier to which structured and unstructured data are associated; tune a mac

Assignees

Inventors

Classifications

  • for patient-specific data, e.g. for electronic patient records · CPC title

  • Machine learning · CPC title

  • G16H50/20Primary

    for computer-aided diagnosis, e.g. based on medical expert systems · CPC title

  • G16H50/70Primary

    for mining of medical data, e.g. analysing previous cases of other patients · CPC title

  • for calculating health indices; for individual health risk assessment · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11250951B2 cover?
Example implementations described herein are directed to systems and methods for feature preparation that receives patient feature data and determines similarity of pre-stored models with the patient feature data. In an example implementation, a database of the pre-stored models is analyzed to assess similarity indicating that feature preparation of the pre-stored models is compatible with the …
Who is the assignee on this patent?
Hitachi Ltd
What technology area does this patent fall under?
Primary CPC classification G16H50/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).