Feature selection method, device and apparatus for constructing machine learning model

US11222285B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11222285-B2
Application numberUS-202117162939-A
CountryUS
Kind codeB2
Filing dateJan 29, 2021
Priority dateOct 24, 2018
Publication dateJan 11, 2022
Grant dateJan 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A feature-selection system obtains a training data set and associated features; divides the training data set into a first number of training data subsets; and forms a plurality of feature-selecting data sets. A feature-selecting data set comprises a second number of training data subsets. The system processes, in parallel, each feature-selecting data set, which comprises: computing a first evaluation index for the features based on the feature-selecting data set; obtaining a group of index ranks corresponding to the features based on the first evaluation index; and obtaining a group of importance ranks corresponding to the features based on the feature-selecting data set and a machine-learning model. The system further obtains a group of total ranks by fusing groups of index ranks and groups importance ranks obtained from processing the plurality of feature-selecting data sets; and selecting target features from the features based on the group of total ranks.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executed method, comprising: obtaining a training data set and features associated with the training data set; dividing the training data set into a first number of training data subsets; forming a plurality of feature-selecting data sets, wherein a respective feature-selecting data set comprises a second number of training data subsets, the second number being smaller than the first number; processing, by a computer in parallel, each feature-selecting data set, which comprises: computing a first evaluation index for the features based on the feature-selecting data set; obtaining a group of index ranks corresponding to the features based on the first evaluation index; and obtaining a group of importance ranks corresponding to the features based on the feature-selecting data set and a machine-learning model; obtaining a group of total ranks corresponding to the features by fusing groups of index ranks and groups importance ranks obtained from processing the plurality of feature-selecting data sets; and selecting target features from the features based on the group of total ranks. 2. The method according to claim 1 , wherein obtaining the group of total ranks of the features comprises: fusing the groups of index ranks obtained from processing the plurality of feature-selecting data sets to obtain a group of total index ranks; fusing the groups of importance ranks obtained from processing the plurality of feature-selecting data sets to obtain a group of total importance ranks; and fusing the group of total index ranks and the group of total importance ranks to obtain the group of total ranks of the plurality of features. 3. The method according to claim 2 , wherein processing each feature-selecting data set further comprises: computing additional evaluation indices for the features based on the feature-selecting data set; and obtaining additional groups of index ranks corresponding to the features based on the additional evaluation indices. 4. The method according to claim 3 , wherein fusing the groups of index ranks to obtain the group of total index ranks of the plurality of features comprises: for each evaluation index selected from a set of evaluation indices comprising the additional evaluation indices and the first evaluation index: extracting from groups of index ranks association with the set of evaluation indices and the plurality of feature-selecting data sets, a plurality of groups of index ranks associate with the selected evolution index; and performing a first rank fusion operation to respectively fuse corresponding ranks of the features in the extracted plurality of groups of index ranks to obtain a group of comprehensive index ranks corresponding to the selected evaluation index; and performing a second rank fusion operation to respectively fuse groups of index ranks obtained for the set of evaluation indices to obtain the total index ranks of the features. 5. The method according to claim 2 , wherein the first rank fusion operation or the second rank fusion operation comprises one of: a mean operation, a maximum operation, a minimum operation, a weighted average operation and a robust rank aggregation (RRA) operation. 6. The method according to claim 1 , wherein dividing the training data set comprises one of: dividing the training data set based on time; and dividing the training data set randomly. 7. The method according to claim 1 , wherein the first evaluation index comprises one of: an information value (IV), a Gini coefficient (GINI), an information gain (IG), mutual information (MI), a Relief score, and a sample stability index (PSI). 8. The method according to claim 1 , wherein the training data set is divided into k training data subsets, wherein k different feature-selecting data sets are formed, and wherein each feature-selecting data set comprises k-1 training data subsets. 9. A computer system, comprising: a processor; a storage device coupled to the processor and storing instructions, which when executed by the processor cause the processor to perform a method, the method comprising: obtaining a training data set and features associated with the training data set; dividing the training data set into a first number of training data subsets; forming a plurality of feature-selecting data sets, wherein a respective feature-selecting data set comprises a second number of training data subsets, the second number being smaller than the first number; processing in parallel, each feature-selecting data set, which comprises: computing a first evaluation index for the features based on the feature-selecting data set; obtaining a group of index ranks corresponding to the features based on the first evaluation index; and obtaining a group of importance ranks corresponding to the features based on the feature-selecting data set and a machine-learning model; obtaining a group of total ranks corresponding to the features by fusing groups of index ranks and groups importance ranks obtained from processing the plurality of feature-selecting data sets; and selecting target features from the features based on the group of total ranks. 10. The computer system according to claim 9 , wherein obtaining the group of total ranks of the features comprises: fusing the groups of index ranks obtained from processing the plurality of feature-selecting data sets to obtain a group of total index ranks; fusing the groups of importance ranks obtained from processing the plurality of feature-selecting data sets to obtain a group of total importance ranks; and fusing the group of total index ranks and the group of total importance ranks to obtain the group of total ranks of the plurality of features. 11. The computer system according to claim 10 , wherein processing each feature-selecting data set further comprises: computing additional evaluation indices for the features based on the feature-selecting data set; and obtaining additional groups of index ranks corresponding to the features based on the additional evaluation indices. 12. The computer system according to claim 11 , wherein fusing the groups of index ranks to obtain the group of total index ranks of the plurality of features comprises: for each evaluation index selected from a set of evaluation indices comprising the additional evaluation indices and the first evaluation index: extracting from groups of index ranks association with the set of evaluation indices and the plurality of feature-selecting data sets, a plurality of groups of index ranks associate with the selected evolution index; and performing a first rank fusion operation to respectively fuse corresponding ranks of the features in the extracted plurality of groups of index ranks to obtain a group of comprehensive index ranks corresponding to the selected evaluation index; and performing a second rank fusion operation to respectively fuse groups of index ranks obtained for the set of evaluation indices to obtain the total index ranks of the features. 13. The computer system according to claim 10 , wherein the first rank fusion operation or the second rank fusion operation comprises one of: a mean operation, a maximum operation, a minimum operation, a weighted average operation and a robust rank aggregation (RRA) operation. 14. The computer system according to claim 9 , wherein dividing the training data set comprises one of: dividing the training data set based on time; and dividing the training data set randomly. 15. The computer system according to claim 9 , wherein the first evaluation index compris

Assignees

Inventors

Classifications

  • Ensemble learning · CPC title

  • using ranking · CPC title

  • G06N99/00Primary

    Subject matter not provided for in other groups of this subclass · CPC title

  • Indexing structures · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11222285B2 cover?
A feature-selection system obtains a training data set and associated features; divides the training data set into a first number of training data subsets; and forms a plurality of feature-selecting data sets. A feature-selecting data set comprises a second number of training data subsets. The system processes, in parallel, each feature-selecting data set, which comprises: computing a first eva…
Who is the assignee on this patent?
Advanced New Technologies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N99/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).