Transductive feature selection with maximum-relevancy and minimum-redundancy criteria

US9483739B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9483739-B2
Application numberUS-201314030708-A
CountryUS
Kind codeB2
Filing dateSep 18, 2013
Priority dateJan 21, 2013
Publication dateNov 1, 2016
Grant dateNov 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments select features from a feature space. In one embodiment, a set of training samples and a set of test samples are received. The set of training samples includes a set of features and a class value. The set of test samples includes the set of features absent the class value. A relevancy with respect to the class value is determined for each of a plurality of unselected features based on the set of training samples. A redundancy with respect to one or more of the set of features is determined for each of the plurality of unselected features in the first set of features based on the set of training samples and the set of test samples. A set of features is selected from the plurality of unselected features based on the relevancy and the redundancy determined for each of the plurality of unselected features.

First claim

Opening claim text (preview).

What is claimed is: 1. An information processing system for selecting features from a feature space, the information processing system comprising: a memory; a processor communicatively coupled to the memory; and a feature selection module communicatively coupled to the memory and the processor, wherein the feature selection module is configured to perform a method comprising: obtaining, by a processor, a set of training samples and a set of test samples, wherein the set of training samples comprises a first set of features and a class value, and wherein the set of test samples comprises a second set of features, where the second set of features is the first set of features absent the class value; determining, for each of a plurality of unselected features in a plurality of features comprising the first and second set of features, a relevancy with respect to the class value based on the set of training samples; determining, for each of the plurality of unselected features, a redundancy with respect to the plurality of features based on both the set of training samples and the set of test samples; selecting a set of features from the plurality of unselected features based on the relevancy and the redundancy determined for each of the plurality of unselected features, wherein the selecting is performed based on; max x j ∈ X - S m - 1 ⁢ [ I ⁡ ( x j training ; c training ) - 1 m - 1 ⁢ ∑ x i ∈ S m - 1 ⁢ I ⁡ ( x j training + test ; c training + test ) ] ,  where x j is a jth feature that is sample independent, x training is a jth feature based on the set of training samples, x j training+test is a jth based on the set of training samples and the set of test samples, i is an integer, X is a set of all features, S m-1 is a set of m−1 features, c is the class value, and I is mutual information; and programming a processor to perform at least one of a set of classification operations and a set of regression operations based on the set of features that have been selected. 2. The information processing system of claim 1 , wherein each of the set of features that has been selected has a maximum relevancy among each of the plurality of unselected features with respect to the class value based on the set of training samples, and has a minimum redundancy among each of the plurality of unselected features with respect to the set of features based on the set of training samples and the set of test samples. 3. The information processing system of claim 1 , wherein the relevancy is determined based on mutual information between a given unselected feature in the plurality of unselected features and the class value based on the set of training samples. 4. The information processing system of claim 3 , wherein the mutual information is determined based on comprising: determining that the class value is within a given threshold; rounding the class value; and multiplying the class value by a scalar, wherein an entropy of the class value after being multiplied by the scalar is within a given threshold of an original entropy of the class value. 5. The information processing system of claim 1 , wherein the redundancy is determined based on mutual information between a given unselected feature in the plurality of unselected features and each feature in the plurality of features. 6. The information processing system of claim 5 , wherein the method further comprises: storing a set of counts for each of a set of values used to determine the mutual information between the given unselected feature and the plurality of features; and determining the mutual information between the given unselected feature and the plurality of features based on the set of counts that has been stored. 7. A non-transitory computer program product for selecting features from a feature space, the computer program product comprising: a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: obtaining, by a processor, a set of training samples and a set of test samples, wherein the set of training samples comprises a first set of features and a class value, and wherein the set of test samples comprises a second set of features, where the second set of features is the first set of features absent the class value; determining, for each of a plurality of unselected features in a plurality of features comprising the first and second set of features, a relevancy with respect to the class value based on the set of training samples; determining, for each of the plurality of unselected features, a redundancy with respect to the plur

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination · CPC title

  • Physics · mapped topic

  • Extracting rules from data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9483739B2 cover?
Various embodiments select features from a feature space. In one embodiment, a set of training samples and a set of test samples are received. The set of training samples includes a set of features and a class value. The set of test samples includes the set of features absent the class value. A relevancy with respect to the class value is determined for each of a plurality of unselected feature…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).