Human-in-the-loop interactive model training

US12191007B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12191007-B2
Application numberUS-201716618656-A
CountryUS
Kind codeB2
Filing dateSep 29, 2017
Priority dateAug 30, 2017
Publication dateJan 7, 2025
Grant dateJan 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example embodiments relate to a method for training a predictive model from data. The method includes defining a multitude of predicates as binary functions operating on time sequences of the features or logical operations on the time sequences of the features. The method also includes iteratively training a boosting model by generating a number of new random predicates, scoring all the new random predicates by weighted information gain with respect to a class label associated with a prediction of the boosting model, selecting a number of the new random predicates with the highest weighted information gain and adding them to the boosting model, computing weights for all the predicates in the boosting model, removing one or more of the selected new predicates with the highest information gain from the boosting model in response to input from an operator. The method may include repeating the prior steps a plurality of times.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method of training a predictive model from data comprising a multitude of features, each feature associated with a real value and a time component, comprising the steps of executing the following instructions in a processor of the computer: a) defining a multitude of predicates as binary functions operating on time sequences of the features or logical operations on the time sequences of the features; b) iteratively training a boosting model by performing the following: 1) Generating a number of new random predicates as binary functions operating on at least one of (i) time sequences of the features or (ii) logical operations on the time sequences of the features; 2) Scoring all the new random predicates by weighted information gain with respect to a class label associated with a prediction of the boosting model; 3) Selecting, from the new random predicates, a number of the new random predicates that are the highest with respect to their weighted information gain scores and adding them to the boosting model; 4) Computing weights for all the predicates in the boosting model; 5) Removing one or more of the selected number of the new random predicates from the boosting model in response to input from an operator; and 6) Repeating the performance of steps 1, 2, 3, 4 and 5 a plurality of times and thereby generating a final iteratively trained boosting model. 2. The method of claim 1 , further comprising the step of c) evaluating the final iteratively trained boosting model. 3. The method of claim 2 , wherein the evaluation step (c) comprises evaluating the final iteratively trained boosting model for at least one of accuracy, complexity, or trustworthiness. 4. The method of claim 1 , wherein the data is in a tuple format of the type {X, x i , t i } where X is the name of feature, x i is a real value of the feature and t i is a time component for the real value x i , and wherein the predicates are defined as binary functions operating on at least one of (i) sequences of tuples or (ii) logical operations on sequences of the tuples. 5. The method of claim 4 , wherein the sequences of tuples are defined by time periods selected from the group consisting of 1 or more days, 1 or more hours, 1 or more minutes, or 1 or more months. 6. The method of claim 1 , wherein the data comprises electronic health record data for a multitude of patients. 7. The method of claim 1 , wherein the method further comprises the step of dividing the predicates into groups based on understandability, namely a first group of relatively more human understandable predicates and a second group of relatively less human understandable predicates and wherein the new random predicates are selected from the first group. 8. The method of claim 7 , wherein the data comprises electronic health record data for a multitude of patients, and wherein the set of predicates are represented in a manner to show the subject matter or source within the electronic health record data of the predicates. 9. The method of claim 8 , wherein the predicates comprise an existence predicate returning a result of 0 or 1 depending on whether a feature exists in the electronic health record data for a given patient in the multitude of patients; and a counts predicate returning a result of 0 or 1 depending on the number of counts of a feature in the electronic health record data for a given patient in the multitude of patients relative to a numeric parameter C. 10. The method of claim 1 , wherein step b) 5) further comprises the step of graphically representing the predicates currently in the boosting model and providing the operator with the ability to remove one or more of the predicates. 11. The method of claim 10 , further comprising the step of graphically representing the weights computed for each of the predicates in step b) 4). 12. The method of claim 1 , further comprising the step of graphically representing a set of predicates added to the boosting model after each of the iterations of step b) 6). 13. The method of claim 1 , wherein step b) further comprises the step of providing the operator with the ability to define a predicate during model training. 14. The method of claim 1 , wherein step b) further comprises the step of removing redundant predicates. 15. The method of claim 1 , further comprising the step of ranking the predicates selected in step b) 3). 16. The method of claim 1 , further comprising the step of generating statistics of predicates in the boosting model and presenting them to the operator. 17. The method of claim 1 , wherein in step b) 5) the one or more predicates are removed which are not causally related to the prediction of the boosting model. 18. A computer-implemented method of training a predictive model from electronic health record data for a multitude of patients, the data comprising a multitude of features, each feature associated with real values and a time component, wherein the data is in a tuple format of the type {X, x i , t i } where X is the name of feature, x i is a real value of the feature and t i is a time component for the real value x i , comprising the steps of implementing the following instructions in a processor of the computer: a) defining a multitude of predicates as at least one of (i) binary functions operating on sequences of the tuples or (ii) logical operations on the sequences of the tuples; b) dividing the multitude of predicates into groups based on understandability, namely a first group of relatively more human understandable predicates and a second group of relatively less human understandable predicates; c) iteratively training a boosting model by performing the following: 1) Generating a number of new random predicates from the first group of predicates as binary functions operating on at least one of (i) sequences of the tuples or (ii) logical operations on the sequences of the tuples; 2) Scoring all the new random predicates by weighted information gain with respect to a class label associated with a prediction of the boosting model; 3) Selecting, from the new random predicates, a number of the new random predicates that are the highest with respect to their weighted information gain scores and adding them to the boosting model; 4) Computing weights for all the predicates in the boosting model; 5) Removing one or more of the selected number of the new random predicates from the boosting model in response to input from an operator; and 6) Repeating the performance of steps 1, 2, 3, 4 and 5 a plurality of times and thereby generating a final iteratively trained boosting model. 19. The method of claim 18 , further comprising the step d) of evaluating the final iteratively trained boosting model. 20. A workstation for providing operator input into iteratively training a boosting model, wherein the workstation comprises an interface and a processor, and wherein the processor is configured to perform operations comprising: 1) Generating a number of new random predicates as binary functions operating on at least one of (i) time sequences of input features or (ii) logical operations on the time sequences of the input features; 2) Scoring all the new random predicates by weighted information gain with respect to a class label associated with a prediction of the boosting model; 3) Selecting, from the new random predicates, a number of the new random predicates that are the highest with respect to their weighted information gain scores and adding them to the boosting

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • Methods or arrangements for processing data by operating upon the order or content of the data handled (logic circuits H03K19/00) · CPC title

  • Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12191007B2 cover?
Example embodiments relate to a method for training a predictive model from data. The method includes defining a multitude of predicates as binary functions operating on time sequences of the features or logical operations on the time sequences of the features. The method also includes iteratively training a boosting model by generating a number of new random predicates, scoring all the new ran…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N5/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).