What technology area does this patent fall under?

Primary CPC classification G06N3/088. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generic workflow for classification of highly imbalanced datasets using deep learning

US11416748B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11416748-B2
Application number	US-201916718524-A
Country	US
Kind code	B2
Filing date	Dec 18, 2019
Priority date	Dec 18, 2019
Publication date	Aug 16, 2022
Grant date	Aug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable storage media for providing a binary classifier include receiving a biased dataset, the biased data set including a plurality of records, each record being assigned to a class of a plurality of classes, one class including a majority class, performing data engineering on at least a portion of the biased dataset to provide a revised dataset, providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE including a binary classifier that classifies records into one of the majority class and a minority class, validating the trained DAE using validation data that is based on at least a portion of the biased dataset, and providing the trained DAE for production use within a production system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for providing a binary classifier, the method being executed by one or more processors and comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 2. The method of claim 1 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 3. The method of claim 2 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 4. The method of claim 1 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 5. The method of claim 1 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 6. The method of claim 1 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value. 7. The method of claim 1 , wherein the validation data comprises records of the majority class and records of the minority class. 8. A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for providing a binary classifier, the operations comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 9. The computer-readable storage medium of claim 8 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 10. The computer-readable storage medium of claim 9 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 11. The computer-readable storage medium of claim 8 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 12. The computer-readable storage medium of claim 8 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 13. The computer-readable storage medium of claim 8 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value. 14. The computer-readable storage medium of claim 8 , wherein the validation data comprises records of the majority class and records of the minority class. 15. A system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations for providing a binary classifier, the operations comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 16. The system of claim 15 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 17. The system of claim 16 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 18. The system of claim 15 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 19. The system of claim 15 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 20. The system of claim 15 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value.

Assignees

Sap Se

Inventors

Classifications

G06N3/088Primary
Non-supervised learning, e.g. competitive learning · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0499
Feedforward networks · CPC title

Patent family

Related publications grouped by family.

View patent family 76438160

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11416748B2 cover?: Methods, systems, and computer-readable storage media for providing a binary classifier include receiving a biased dataset, the biased data set including a plurality of records, each record being assigned to a class of a plurality of classes, one class including a majority class, performing data engineering on at least a portion of the biased dataset to provide a revised dataset, providing a tr…
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).