Generic workflow for classification of highly imbalanced datasets using deep learning

US11416748B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11416748-B2
Application numberUS-201916718524-A
CountryUS
Kind codeB2
Filing dateDec 18, 2019
Priority dateDec 18, 2019
Publication dateAug 16, 2022
Grant dateAug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable storage media for providing a binary classifier include receiving a biased dataset, the biased data set including a plurality of records, each record being assigned to a class of a plurality of classes, one class including a majority class, performing data engineering on at least a portion of the biased dataset to provide a revised dataset, providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE including a binary classifier that classifies records into one of the majority class and a minority class, validating the trained DAE using validation data that is based on at least a portion of the biased dataset, and providing the trained DAE for production use within a production system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for providing a binary classifier, the method being executed by one or more processors and comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 2. The method of claim 1 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 3. The method of claim 2 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 4. The method of claim 1 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 5. The method of claim 1 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 6. The method of claim 1 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value. 7. The method of claim 1 , wherein the validation data comprises records of the majority class and records of the minority class. 8. A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for providing a binary classifier, the operations comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 9. The computer-readable storage medium of claim 8 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 10. The computer-readable storage medium of claim 9 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 11. The computer-readable storage medium of claim 8 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 12. The computer-readable storage medium of claim 8 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 13. The computer-readable storage medium of claim 8 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value. 14. The computer-readable storage medium of claim 8 , wherein the validation data comprises records of the majority class and records of the minority class. 15. A system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations for providing a binary classifier, the operations comprising: receiving a biased dataset, the biased dataset comprising a plurality of records, each record being assigned to a class of a plurality of classes, one class comprising a majority class; performing data engineering on at least a portion of the biased dataset to provide a revised dataset; providing a trained deep autoencoder (DAE) by training a DAE using only records assigned to the majority class from the revised dataset, the trained DAE comprising the binary classifier that classifies records into one of the majority class and a minority class; validating the trained DAE using validation data that is based on at least a portion of the biased dataset; and providing the trained DAE for production use within a production system. 16. The system of claim 15 , wherein the DAE comprises a first hidden layer having a different number of neurons than an input layer, and a second hidden layer having a different number of neurons than the first hidden layer. 17. The system of claim 16 , wherein the DAE further comprises a third hidden layer having a lower number of neurons than the second hidden layer and having a greater number of neurons than an output layer. 18. The system of claim 15 , wherein the data engineering comprises one of reducing a dimensionality of records and expanding a dimensionality of records in the at least a portion of the biased dataset. 19. The system of claim 15 , wherein the data engineering comprises scaling feature values of records in the at least a portion of the biased dataset. 20. The system of claim 15 , wherein the production use of the trained DAE comprises: providing a record as input to the trained DAE; receiving at least one value as output from the trained DAE, the at least one value being generated based on processing of the record through the trained DAE and representing an error in recreation of the record by the trained DAE; and assigning the record to one of the majority class and the minority class based on the at least one value.

Assignees

Inventors

Classifications

  • G06N3/088Primary

    Non-supervised learning, e.g. competitive learning · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11416748B2 cover?
Methods, systems, and computer-readable storage media for providing a binary classifier include receiving a biased dataset, the biased data set including a plurality of records, each record being assigned to a class of a plurality of classes, one class including a majority class, performing data engineering on at least a portion of the biased dataset to provide a revised dataset, providing a tr…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).