Mitigating overfitting in training machine trained networks

US10586151B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10586151-B1
Application numberUS-201615224632-A
CountryUS
Kind codeB1
Filing dateJul 31, 2016
Priority dateJul 31, 2015
Publication dateMar 10, 2020
Grant dateMar 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments of the invention provide a novel method for training a multi-layer node network that mitigates against overfitting the adjustable parameters of the network for a particular problem. During training, the method of some embodiments adjusts the modifiable parameters of the network by iteratively identifying different interior-node, influence-attenuating masks that effectively specify different sampled networks of the multi-layer node network. An interior-node, influence-attenuating mask specifies attenuation parameters that are applied (1) to the outputs of the interior nodes of the network in some embodiments, (2) to the inputs of the interior nodes of the network in other embodiments, or (3) to the outputs and inputs of the interior nodes in still other embodiments. In each mask, the attenuation parameters can be any one of several values (e.g., three or more values) within a range of values (e.g., between 0 and 1).

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of configuring a machine trained (MT) network comprising input and output nodes, and interior nodes between the input and output nodes, each node producing an output, and each interior or output node receiving as inputs a set of outputs of a set of other nodes, each node having a set of configurable parameters for training, the method comprising: iteratively selecting a plurality of influence-attenuating masks, each mask specifying a different plurality of attenuation parameters for applying to the inputs of each of the interior nodes, said attenuation parameters for each of a set of one or more masks including at least three different values; and for each mask: selecting an input set of values with a known output set of values; using the mask to forward propagate the input set of values through the network's nodes to produce a computed output set of values; and using the mask to back propagate a set of error values that quantifies a difference between the input set's known output set and the input set's computed output set, said back propagation assigning error values from later nodes to earlier nodes and adjusting the configurable parameters of the nodes based on (i) the assigned error values and (ii) the plurality of attenuation parameters. 2. The method of claim 1 , wherein the attenuation parameters for each of the masks fall within a range of values between 0 and 1, and excluding 0 but including 1. 3. The method of claim 1 , wherein the attenuation parameters for each of the masks fall within a range of values between 0 and 1. 4. The method of claim 1 , wherein each of a plurality of nodes comprises: a linear component that uses a set of weight coefficients to combine a set of output values of a set of nodes to compute a first calculated value; and a nonlinear component to compute a second calculated value from the node's first calculated value, wherein the configurable parameters of the network comprise at least the set of weight coefficients of the set of nodes. 5. The method of claim 4 , wherein using the mask to forward propagate comprises multiplying each weight coefficient associated with a node's input by an attenuation parameter specified for that input in the selected mask. 6. The method of claim 1 , wherein using the mask to forward propagate comprises multiplying each node's input by the attenuation parameter specified for node in the selected mask. 7. The method of claim 6 , wherein in a mask, an attenuation parameter of 1 for a node leaves the node's input unaffected, while an attenuation parameter less than 1 for a node reduces the node's input and thereby diminishes the influence of the node's input during the training of the network for that mask. 8. The method of claim 1 , wherein back propagating the error values for the masks selected in different iterations averages the configurable parameters that are produced for the plurality of the selected masks to obtain the configurable parameters of the MT network. 9. A non-transitory machine readable medium storing a program for configuring a machine trained (MT) network comprising input and output nodes, and interior nodes between the input and output nodes, each node producing an output, and each interior or output node receiving as inputs a set of outputs of a set of other nodes, each node having a set of configurable parameters for training, the program comprising sets of instructions for: iteratively selecting a plurality of influence-attenuating masks, each mask specifying a different plurality of attenuation parameters for applying to the inputs of each of the interior nodes, said attenuation parameters for each of a set of one or more masks including at least three different values; and for each mask: selecting an input set of values with known output set of values; using the mask to forward propagate the input set of values through the network's nodes to produce a computed output set of values; and using the mask to back propagate a set of error values that quantifies a difference between the input set's known output set and the input set's computed output set, said back propagation assigning error values from later nodes to earlier nodes and adjusting the configurable parameters of the nodes based on (i) the assigned error values and (ii) the plurality of attenuation parameters. 10. The non-transitory machine readable medium of claim 9 , wherein the attenuation parameters for each of the masks fall within a range of values between 0 and 1, and excluding 0. 11. The non-transitory machine readable medium of claim 9 , wherein each of a plurality of nodes comprises: a linear component that uses a set of weight coefficients to combine a set of output values of a set of nodes to compute a first calculated value; and a nonlinear component to compute a second calculated value from the node's first calculated value, wherein the configurable parameters of the network comprise at least the set of weight coefficients of the set of nodes. 12. The non-transitory machine readable medium of claim 11 , wherein the set of instructions for using the mask to forward propagate comprises a set of instructions for multiplying each weight coefficient associated with a node's input by an attenuation parameter specified for that input in the selected mask. 13. The non-transitory machine readable medium of claim 9 , wherein the set of instructions for using the mask to forward propagate comprises a set of instructions for multiplying each node's input by the attenuation parameter specified for node in the selected mask. 14. The non-transitory machine readable medium of claim 13 , wherein in a mask, an attenuation parameter of 1 for a node leaves the node's input unaffected, while an attenuation parameter less than 1 for a node reduces the node's input and thereby diminishes the influence of the node's input during the training of the network for that mask. 15. A method of configuring a machine trained (MT) network comprising input and output nodes, and interior nodes between the input and output nodes, each node producing an output, and each interior or output node receiving a set of outputs of a set of other nodes, each node having a set of configurable parameters for training, the method comprising: iteratively selecting a plurality of influence-attenuating masks, each mask specifying a different plurality of attenuation parameters for applying to the outputs of each of the interior nodes, said attenuation parameters for each of a set of one or more masks including at least three different values; and for each mask: selecting an input set of values with known output set of values; using the mask to forward propagate the input set of values through the network's nodes to produce a computed output set of values; and using the mask to back propagate a set of error values that quantifies a difference between the input set's known output set and the input set's computed output set, said back propagation assigning error values from later nodes to earlier nodes and adjusting the configurable parameters of the nodes based on (i) the assigned error values and (ii) the plurality of attenuation parameters. 16. The method of claim 15 , wherein the attenuation parameters for each of the set of masks fall within a range of values between 0 and 1, and excluding 0 but including 1. 17. The method of claim 15 , wherein the attenuation parameters for each of the set of masks fall within a range of values between 0 and 1. 18. The method of claim 15 , wherein each of

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Activation functions · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10586151B1 cover?
Some embodiments of the invention provide a novel method for training a multi-layer node network that mitigates against overfitting the adjustable parameters of the network for a particular problem. During training, the method of some embodiments adjusts the modifiable parameters of the network by iteratively identifying different interior-node, influence-attenuating masks that effectively spec…
Who is the assignee on this patent?
Perceive Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).