Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N99/005. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Error-driven feature ideation in machine learning

US2016162803A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016162803-A1
Application number	US-201414562750-A
Country	US
Kind code	A1
Filing date	Dec 7, 2014
Priority date	Dec 7, 2014
Publication date	Jun 9, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are technologies directed to a feature ideator. The feature ideator can initiate a classifier that analyzes a training set of data in a classification process. The feature ideator can generate one or more suggested features relating to errors generated during the classification process. The feature ideator can generate an output to cause the errors to be rendered in a format that provides for an interaction with a user. A user can review the summary of the errors or the individual errors and select one or more features to increase the accuracy of the classifier.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of feature ideation, comprising: determining a plurality of errors in a training set of labeled textual data; determining a set of candidate features to correct at least one error of the plurality of errors; receiving a selection of at least one candidate feature of the set of candidate features to be an applied feature; and retraining a classifier based on the applied feature. 2 . The method of claim 1 , wherein determining a plurality of errors in a training set of labeled textual data comprises: receiving a training set of data comprising a plurality of labeled textual data; and initiating the classifier to examine the labeled textual data to determine the plurality of errors. 3 . The method of claim 2 , further comprising deconstructing the plurality of labeled textual data into constituent components. 4 . The method of claim 1 , further comprising generating an error percent by determining a percentage of textual data identified correctly by the classifier. 5 . The method of claim 1 , further comprising: receiving a selection of at least one feature candidate of the set of feature candidates for further exploration; and presenting a plurality of words or n-grams associated with the selection of the at least one feature candidate of the set of feature candidates for further exploration. 6 . The method of claim 1 , further comprising rendering a featuring area comprising the applied feature. 7 . The method of claim 1 , further comprising: determining an updated plurality of errors in a training set of labeled textual data based on the applied feature; displaying a set of updated feature candidates based on the training set to correct at least one error of the updated plurality of errors; receiving a selection of at least one feature candidate of the updated set of feature candidates to be a second applied feature; and retraining a classifier based on the second applied feature. 8 . The method of claim 7 , further comprising updating the featuring area with a second set of candidate features determined by the classifier trained with the second applied feature. 9 . The method of claim 1 , further comprising displaying a frequency indicator proximate to at least one of the set of feature candidates, the frequency indicator indicating a frequency of occurrences in which the at least one of the set of feature candidates is associated with an error and a frequency of occurrences in which the at least one of the set of feature candidates is associated with a positive match or an estimated impact of adding the at least one of the set of feature candidates as the applied feature. 10 . A computer comprising: a processor; and a computer-readable medium in communication with the processor, the computer-readable medium comprising computer-executable instructions that, when executed by the processor, cause the processor to: initiate a classifier of a feature ideator to determine a plurality of errors in a training set of labeled textual data; initiate a candidate feature generator of the feature ideator to determine a set of feature candidates based on the training set to correct at least one error of the plurality of errors; and initiate the feature ideator to receive a selection of at least one feature candidate of the set of feature candidates to be an applied feature and to retrain the classifier based on the applied feature. 11 . The computer of claim 10 , further comprising computer-executable instructions to: determine contrast terms that do not generate an error; and display the contrast terms. 12 . The computer of claim 11 , wherein the contrast terms displayed and the set of feature candidates displayed are summarized by computer-executable instructions to: obtain a frequency of words occurring as a potential member of the set of feature candidate and as a potential member of the plurality of contrast terms; calculate a difference in frequency between the occurrence of the words as a potential member of the set of feature candidate and as a potential member of the plurality of contrast terms; select a number of words occurring more often as errors as the feature candidates; and select a number of words occurring more often as contrasts as the contrast terms. 13 . The computer of claim 12 , further comprising computer-executable instructions to calculate an improvement score to be obtained if a selected feature candidate or a selected contrast term were used to create a new feature. 14 . The computer of claim 13 , wherein the computer-executable instructions to calculate an improvement is performed using a logarithmic loss technique. 15 . The computer of claim 12 , further comprising computer-executable instructions to rank the feature candidates and the contrast terms by the improvement score associated with each of the feature candidates and the contrast terms. 16 . The computer of claim 15 , further comprising computer-executable instructions to display a number of the feature candidates having a certain improvement score as a set of feature candidates and a number of the contrast terms selected having a certain improvement score as the contrast terms. 17 . A computer-readable medium having computer-executable instructions thereupon that, when executed by a computer, cause the computer to: determine a plurality of errors associated with classifying a training set of data; determine a plurality of candidate features associated with at least one of the plurality of errors; and render a feature ideation user interface comprising: a featuring area comprising a create feature section for receiving an input to initiate a feature idealization process and an applied feature section for displaying currently applied features; a feature candidate section for displaying the candidate features; and a contrast term section for displaying contrast terms, the contrast terms comprising terms that are properly classified. 18 . The computer-readable storage medium of claim 17 , wherein the feature ideation user interface further comprises a focus selection control configured to receive an input of which of the error type to apply to the candidate features displayed in the feature candidate section. 19 . The computer-readable storage medium of claim 17 , wherein the feature ideation user interface further comprises a frequency indicator proximate to at least one of the candidate features or at least one of the contrast terms, the frequency indicator comprising a top bar having a certain length to indicate a frequency of the at least one of the candidate features or the at least one of the contrast terms in positive documents and a lower bar having a certain length that indicates a frequency of the at least one of the candidate features or the at least one of the contrast terms term in negative documents. 20 . The computer-readable storage medium of claim 17 , wherein the feature ideation user interface further comprises an accuracy percentage indicator displaying an accuracy of the classifier.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F11/162
Displays · CPC title
G06F16/35
Clustering; Classification · CPC title
G06F11/0787
Storage of error reports, e.g. persistent data storage, storage using memory protection · CPC title
G06N99/005Primary
Physics · mapped topic
G06N20/00Primary
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 54884401

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016162803A1 cover?: Disclosed herein are technologies directed to a feature ideator. The feature ideator can initiate a classifier that analyzes a training set of data in a classification process. The feature ideator can generate one or more suggested features relating to errors generated during the classification process. The feature ideator can generate an output to cause the errors to be rendered in a format th…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).