Building ensembles for deep learning by parallel data splitting

US11195097B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11195097-B2
Application numberUS-201916609130-A
CountryUS
Kind codeB2
Filing dateJul 2, 2019
Priority dateJul 16, 2018
Publication dateDec 7, 2021
Grant dateDec 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computer-implemented systems and methods build ensembles for deep learning through parallel data splitting by creating and training an ensemble of up to 2n ensemble members based on a single base network and a selection of n network elements. The ensemble members are created by the “blasting” process, in which training data are selected for each of the up to 2 n ensemble members such that each of the ensemble members trains with updates in a different direction from each of the other ensemble members. The ensemble members may also be trained with joint optimization.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for building a machine learning ensemble, the method comprising: selecting, by a computer system, n selected network elements of a base machine-learning network, where n>1; making, by the computer system, M copies of the base machine-learning network, wherein the value, of M is greater than or equal to 2, and less than or equal to 2 n ; wherein prior to making the M copies of the base machine-learning network, iteratively for each training data item in an initial. set of training data items: computing, by a computer system, in a forward computation through the base machine- learning network, an activation value for each non-input layer node of the base machine-learning network; and computing, by the computer system, in a back-propagation computation through the base machine-learning network: for each non-input layer node, a partial derivative for an objective of the base machine-learning, network with respect to the activation value for the non- input layer node; and for each directed arc in the base machine-learning network, a partial derivative for the objective with respect to a weight parameter for the directed arc: training, by the computer system, each of the M copies of the base machine-learning network such that each of the M copies of the base machine-learning network is trained to change its learned parameters in a different direction than any of the other M copies, wherein training each of the_M copies of the base machine-learning network comprises training, by the computer system, the m-th copy of the base network, where m=1, . . . , M, with a m-th set of training data items, wherein the m-th set of training data items comprises each training data item in an initial set of training data. items where there is agreement between a value of a k-th bit of an n-bit Boolean vector and a sign for the kth selected network element of the n selected network elements of the base network for the training data item, for all k=1, . . . , n, where: if the k-th selected network element is a node, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative for the objective of the base network with respect to the activation value for the node to determine agreement; and if the k-th selected network clement is a directed arc, the value of the k-th bit of the n-hit Boolean vector is compared to the sign of the partial derivative with respect to the weight parameter for the directed arc to determine agreement; and combining, by the computer system, the M copies of the base machine-learning network into an ensemble. 2. The method of claim wherein the base machine-learning network comprises a base neural network. 3. The method of claim 2 , wherein: the base neural network comprises a plurality of nodes and plurality of directed arcs; each directed arc is between two nodes of the base neural network; and the n selected network elements of the base machine-learning network comprise s nodes of the base neural network and t directed arcs of the base neural network, where s and t are integers greater than or equal to zero, and where s+t=n. 4. The method of claim 2 , wherein the base neural network comprises a base deep neural network. 5. The method of claim 4 . wherein the base deep neural network comprises a base feed forward deep neural network. 6. The method of claim 1 , wherein the selected network elements of the base machine- learning network are selected by a machine-learning learning coach. 7. The method of claim 1 , further comprising, after combining the base machine-learning network and the M copies of the base machine-learning network into the ensemble, training, by the computer system, the ensemble with a joint optimization network. 8. The method of claim 1 , wherein training each of the M copies of the base machine-learning network comprises: partitioning a initial set of training data for the M copies into M subsets of training data; and training each of the M copies on a separate subset of training data. 9. The method of claim 8 , Where the M subsets of training data comprise M unique subsets of training data. 10. The method of claim 9 , wherein the M unique subsets of training data comprise M disjoint sets of training data. 11. The method of claim 8 , wherein there is an upper limit F on the number of M subsets of training data on which every training data example in the initial set of training data can be included, such that no training data examples in the initial training set may be placed into more than F of the M subsets. 12. The method of claim wherein training each of the M copies of the base machine-learning network comprises: training, by the computer system, a machine-learning classifier, to classify each training data item into at least one of two or more classification categories, wherein training the machine- learning classifier comprises using partial derivatives computed in the back-propagation computation through the base network as input variables; partitioning, by the computer system, the training, data items into subsets of training data items based on the classification categories; and training, by the computer system, each of the M copies of the base machine-learning network with one of the subsets of training data items. 13. The method of claim 12 , wherein the machine-learning classifier is trained to classify data items to the two or more classification categories based on a distance measure between pairs of training data items. 14. The method of claim 13 , wherein the distance measure is computed using a formula that comprises a hyperparameter, wherein the hyperparameter is a relative weight given to the distance measure compared to a weight given to a difference in signs of partial derivatives for the pairs of training data items. 15. The method of claim 12 , wherein the machine-learning classifier comprises a classifier form selected from the group consisting of a decision tree, a neural network and a clustering algorithm. 16. The method of claim 12 , wherein training the machine-learning classifier is trained through supervised learning. 17. A computer system for building a machine learning ensemble, the computer system comprising one or more processing units that are programmed to: select n selected network elements of a base machine-learning network, where n>1; make M copies of the base machine-learning network, wherein the value_of M is greater than or equal to 2, and less than or equal to 2″; wherein prior to making the M copies of the base machine-learning network, iteratively for each training data item in an initial set of training data items: compute, in a forward computation through the base machine-learning network, an activation value for each non-input layer node of the base machine-learning network; and compute in a hack-propagation computation through the base machine-learning network: for each non-input layer node, a partial derivative for an objective of the base machine-learning network with respect to the activation value for the node: and for each directed arc in the base machine-learning network, a partial derivative for the objective with respect to a weight parameter for the directed arc; train each of the M copies of the base machine-learning network such that each of the M copies of the base machine-learning network is trained to change its learned parameters in a different direction than any of the other M copies, wherein the one or more processing units are programmed to train each of the M copies of the base machine-learning, network

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Supervised learning · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11195097B2 cover?
Computer-implemented systems and methods build ensembles for deep learning through parallel data splitting by creating and training an ensemble of up to 2n ensemble members based on a single base network and a selection of n network elements. The ensemble members are created by the “blasting” process, in which training data are selected for each of the up to 2 n ensemble members such that each…
Who is the assignee on this patent?
D5Ai Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).