What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Sep 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Large-scale anomaly detection with relative density-ratio estimation

US2016253598A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016253598-A1
Application number	US-201514634515-A
Country	US
Kind code	A1
Filing date	Feb 27, 2015
Priority date	Feb 27, 2015
Publication date	Sep 1, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a set of training data consisting of inliers may be obtained. A supervised classification model may be trained using the set of training data to identify outliers. The supervised classification model may be applied to generate an anomaly score for a data point. It may be determined whether the data point is an outlier based, at least in part, upon the anomaly score.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: obtaining a set of training data consisting of inliers; training a supervised classification model using the set of training data to identify outliers; applying the supervised classification model to generate an anomaly score for a data point; and determining whether the data point is an outlier based, at least in part, upon the anomaly score. 2 . The method as recited in claim 1 , wherein the supervised classification model comprises a supervised two-class classification model that estimates a relative importance measure, the relative importance measure being a ratio of training and test data densities. 3 . The method as recited in claim 1 , wherein the supervised classification model comprises a gradient boosted decision tree (GBDT) algorithm. 4 . The method as recited in claim 1 , wherein the supervised classification model performs feature selection to select one or more features upon which to generate anomaly scores for data points. 5 . The method as recited in claim 1 , wherein the set of training data comprises email account data corresponding to non-spammers, and wherein determining whether the data point is an outlier comprises determining whether the data point is a compromised email account. 6 . The method as recited in claim 1 , wherein the set of training data comprises images of semiconductors, and wherein determining whether the data point is an outlier comprises determining whether the data point corresponds to a faulty semiconductor. 7 . The method as recited in claim 1 , wherein the set of training data comprises speaker data. 8 . An apparatus, comprising: a processor; and a memory storing thereon computer-readable instructions, the computer-readable instructions being configured to: obtain a set of training data consisting of inliers; train a supervised classification model using the set of training data to identify outliers; apply the supervised classification model to generate an anomaly score for a data point; and determine whether the data point is an outlier based, at least in part, upon the anomaly score. 9 . The apparatus as recited in claim 8 , wherein the supervised classification model comprises a supervised two-class classification model that estimates a relative importance measure, the relative importance measure being a ratio of training and test data densities. 10 . The apparatus as recited in claim 8 , wherein the supervised classification model comprises a gradient boosted decision tree (GBDT) algorithm. 11 . The apparatus as recited in claim 8 , wherein the supervised classification model performs feature selection to select one or more features upon which to generate anomaly scores for data points. 12 . The apparatus as recited in claim 8 , wherein the set of training data comprises email account data corresponding to non-spammers, and wherein determining whether the data point is an outlier comprises determining whether the data point is a compromised email account. 13 . The apparatus as recited in claim 8 , wherein the set of training data comprises images of semiconductors, and wherein determining whether the data point is an outlier comprises determining whether the data point corresponds to a faulty semiconductor. 14 . The apparatus as recited in claim 8 , wherein the set of training data comprises speaker data. 15 . A non-transitory computer-readable storage medium, comprising: instructions for obtaining a set of training data consisting of inliers; instructions for training a supervised classification model using the set of training data to identify outliers; instructions for applying the supervised classification model to generate an anomaly score for a data point; and instructions for determining whether the data point is an outlier based, at least in part, upon the anomaly score. 16 . The non-transitory computer-readable storage medium as recited in claim 15 , wherein the supervised classification model comprises a supervised two-class classification model that estimates a relative importance measure, the relative importance measure being a ratio of training and test data densities. 17 . The non-transitory computer-readable storage medium as recited in claim 15 , wherein the supervised classification model comprises a gradient boosted decision tree (GBDT) algorithm. 18 . The non-transitory computer-readable storage medium as recited in claim 15 , wherein the supervised classification model performs feature selection to select one or more features upon which to generate anomaly scores for data points. 19 . The non-transitory computer-readable storage medium as recited in claim 15 , wherein the set of training data comprises email account data corresponding to non-spammers, and wherein determining whether the data point is an outlier comprises determining whether the data point is a compromised email account. 20 . The non-transitory computer-readable storage medium as recited in claim 15 , wherein the set of training data comprises image data or speaker data.

Assignees

Yahoo Inc

Inventors

Classifications

G06F21/552
involving long-term monitoring or reporting · CPC title
G06N20/00Primary
Machine learning · CPC title
G06N99/005Primary
Physics · mapped topic
G06N20/20Primary
Ensemble learning · CPC title
G06N20/10
using kernel methods, e.g. support vector machines [SVM] · CPC title

Patent family

Related publications grouped by family.

View patent family 56798289

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016253598A1 cover?: In one embodiment, a set of training data consisting of inliers may be obtained. A supervised classification model may be trained using the set of training data to identify outliers. The supervised classification model may be applied to generate an anomaly score for a data point. It may be determined whether the data point is an outlier based, at least in part, upon the anomaly score.
Who is the assignee on this patent?: Yahoo Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Sep 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).