Unified search for dual domains
US-12105727-B2 · Oct 1, 2024 · US
US9292550B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9292550-B2 |
| Application number | US-201313772852-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 21, 2013 |
| Priority date | Feb 21, 2013 |
| Publication date | Mar 22, 2016 |
| Grant date | Mar 22, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and other embodiments associated with feature generation and model selection for generalized linear models are described. In one embodiment, a method includes ordering candidate features in a dataset being considered by a streamwise feature selection process according to an inclusion score that reflects a likelihood that a given candidate feature will be included in the GLM. The ordered candidate features are provided to the streamwise feature selection process for acceptance testing. In one embodiment, the method also includes selecting penalty criterion for use in the acceptance testing that is based on characteristics of the dataset.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: identifying a dataset that stores values for a target attribute and input attributes, where the input attributes are under consideration for inclusion in a generalized linear model that predicts a value of the target attribute based on a selection of features, where each feature comprises a combination of one or more of the input attributes; identifying candidate features, where a candidate feature comprises a combination of one or more of the input attributes; computing respective inclusion scores for respective candidate features, based, at least in part on a likelihood that the candidate feature will be selected for inclusion in the generalized linear model; ordering the candidate features according to inclusion score; constructing a set of one or more branches of candidate features, where each branch includes candidate features ordered according to inclusion score from highest inclusion score to lowest inclusion score, where the one or more branches do not include candidate features having an inclusion score below a predetermined minimum score; and providing a branch of candidate features to a streamwise feature selection process configured to construct the generalized linear model by considering candidate features in the branch, in turn, starting with the candidate feature with the highest inclusion score, and including selected candidate features in the generalized linear model. 2. The computer-implemented method of claim 1 , comprising computing the inclusion scores for a branch in a single scan of the input attributes. 3. The computer-implemented method of claim 1 , comprising providing one or more termination criteria to the streamwise feature selection process such that when the termination criteria are met, consideration of candidate features in a branch by the streamwise feature selection process is terminated, where the termination criteria are based, at least in part, on one or more of: a failure to accept a candidate feature having a highest inclusion score; a predetermined number of consecutive failures to accept candidate features having next highest inclusion scores; and when no remaining candidate features have a sufficiently high inclusion score, such that a lower bound on a penalized cost of a candidate model is less than a penalized cost of a last accepted model. 4. The computer-implemented method of claim 1 , comprising: determining that one or more candidate features in a branch have been accepted for inclusion in the generalized linear model; computing updated inclusion scores for remaining candidate features in the branch that have not yet been provided to the streamwise feature selection process; where the updated inclusion scores are computed based, at least in part, on a correlation between respective candidate features and a residual error of a last accepted generalized linear model that includes the one or more accepted candidate features; and providing the remaining candidate features to the streamwise feature selection process in order of updated inclusion scores. 5. The computer-implemented method of claim 1 , further comprising upon termination by the streamwise feature selection process of consideration of candidate features in the branch, identifying candidate features that were accepted by the streamwise selection process, and constructing subsequent branches by: generating one or more branches of candidate features using remaining input attributes that have not been considered for inclusion in a branch, or generating a set of one or more second order branches of new candidate features by combining, for each branch in the set of second order branches, an accepted candidate feature with the input attributes; computing respective inclusion scores for respective new candidate features in the subsequent branch, based, at least in part on a likelihood that the new candidate feature will be selected for inclusion in the generalized linear model; constructing the subsequent branch of new candidate features ordered according to inclusion score; and providing the subsequent branch of candidate features, in order of inclusion score, to the streamwise feature selection process. 6. The computer-implemented method of claim 5 , where candidate features in the second order branches comprise respective candidate features that are the product of an accepted candidate feature and respective input attributes. 7. The computer-implemented method of claim 6 , comprising: upon termination by the streamwise feature selection process of the second order branches, constructing a set of one or more third order branches by: identifying second order candidate features that were accepted by the streamwise selection process; generating branches of respective third order candidate features by combining, for each third order branch, an accepted second order candidate feature with respective input attributes; computing respective inclusion scores for respective third order candidate features in the subsequent branch, based, at least in part on a likelihood that the third order candidate feature will be selected for inclusion in the generalized linear model; constructing the third order branch of new candidate features ordered according to inclusion score; and providing the third order branch of candidate features, in order of inclusion score, to the streamwise feature selection process. 8. The computer-implemented method of claim 5 , where respective inclusion scores are based, at least in part, on a correlation between respective candidate features and i) residual errors a last accepted generalized linear model or ii) last iteration working residuals of a last accepted generalized linear model. 9. The computer-implemented method of claim 6 , further comprising terminating the streamwise feature selection process when all branches meet one or more termination criteria. 10. The computer-implemented method of claim 1 , where the streamwise feature selection process performs acceptance testing to select candidate features by: constructing a candidate generalized linear model that includes a next candidate feature in the branch; comparing the candidate generalized linear models to a last accepted generalized linear model; accepting the candidate generalized linear model when acceptance criteria are met, where the acceptance criteria are based, at least in part, on a penalty criterion; further where the penalty criterion is selected, at least in part, on characteristics of the dataset; and where the streamwise feature selection process uses the penalty criterion to compute a penalized cost of a candidate generalized linear model and accepts the candidate generalized linear model when the penalized cost is less than a penalized cost for the last accepted generalized linear model. 11. The computer-implemented method of claim 10 where the characteristics of the dataset comprise one or more of: a number of rows and a number of input attributes. 12. The computer-implemented method of claim 1 , further comprising: partitioning the dataset into a first partition and a second partition; computing inclusion scores using input attribute values and target attribute values in the first partition; and further where the streamwise feature selection process tests candidate features for inclusion in the generalized linear model using input attribute values and target attribute values in the second partition. 13. The computer-implemented method of claim 1 , comprising, when the target attribute is numeric, calculating an initial inclusion score for a candidate feature
Indexing; Data structures therefor; Storage structures · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.