Training network to minimize worst-case error

US2025068912A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025068912-A1
Application numberUS-202418787715-A
CountryUS
Kind codeA1
Filing dateJul 29, 2024
Priority dateNov 29, 2016
Publication dateFeb 27, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments provide a method for configuring a machine-trained (MT) network that includes multiple configurable weights to train. The method propagates a set of inputs through the MT network to generate a set of output probability distributions. Each input has a corresponding expected output probability distribution. The method calculates a value of a continuously-differentiable loss function that includes a term approximating an extremum function of the difference between the expected output probability distributions and generated set of output probability distributions. The method trains the weights by back-propagating the calculated value of the continuously-differentiable loss function.

First claim

Opening claim text (preview).

1 . A method for configuring a machine-trained (MT) network comprising a plurality of configurable weights to train, the method comprising: propagating a set of inputs through the MT network to generate a set of output probability distributions, each input having a corresponding expected output probability distribution included in a set of expected output probability distributions; calculating a value of a continuously-differentiable loss function comprising a term that approximates an extremum function of a difference between each output probability distribution in the set of output probability distributions and the corresponding expected output probability distribution; and training the plurality of configurable weights by back-propagating the calculated value of the continuously-differentiable loss function. 2 . The method of claim 1 , wherein the term approximates a maximum function, wherein training the plurality of configurable weights comprises minimizing the term. 3 . The method of claim 1 , wherein the term approximates a minimum function, wherein training the plurality of configurable weights comprises maximizing the term. 4 . The method of claim 1 , wherein the term is a log-sum-exponent function. 5 . The method of claim 1 , wherein: the MT network receives an input and outputs one of a set of discrete categories for the input; the set of inputs comprises a plurality of inputs for each category included in the set of discrete categories; the set of output probability distributions comprises an output probability distribution for each category of input; and the set of expected output probability distributions comprises an expected output probability distribution for each category of input. 6 . The method of claim 5 , wherein the expected output probability distribution for a particular category of input is 1 and 0 for each other category. 7 . The method of claim 5 , wherein the term comprises a natural logarithm of a summation of a plurality of exponential functions, wherein an index of the summation is the set of discrete categories, wherein an exponent of each exponential function for a particular input category is a function of the output probability distribution for the particular input category. 8 . The method of claim 7 , wherein the function of the output probability distribution is an entropy calculation for the output probability distribution. 9 . The method of claim 8 , wherein the entropy calculation for a particular generated output probability distribution comprises a sum over each discrete probability in the output probability distribution of the discrete probability multiplied by a negative of a base-2 logarithm of the discrete probability. 10 . The method of claim 8 , wherein the term that approximates the extremum function biases training of the plurality of configurable weights towards weight values that minimize a maximum of the entropy calculation for the set of discrete categories. 11 . The method of claim 1 , wherein the MT network comprises input nodes, output nodes, and interior nodes between the input nodes and the output nodes, wherein each node produces an output value and each interior node and each output node receives as input values a set of output values of other nodes and applies a set of the plurality of configurable weights to each received input value. 12 . The method of claim 1 further comprising performing the propagating, calculating, and back-propagating iteratively. 13 . The method of claim 1 , wherein training the plurality of configurable weights comprises: back propagating the calculated value through the MT network to determine, for each weight, a rate of change in the calculated value relative to a rate of change in a particular weight; and modifying each particular weight according to the determined rate of change for the particular weight. 14 . The method of claim 1 , wherein the MT network is for embedding into a device after training is complete. 15 . The method of claim 1 , wherein propagating the set of inputs through the MT network comprises calculating an output value for each interior node and output node, wherein calculating the output value for a particular node comprises: receiving a set of input values from a set of other nodes; calculating a linear summation of each input value multiplied by a corresponding weight value; and applying a non-linear function to the linear summation to calculate the output value for the particular node. 16 . A non-transitory machine-readable medium storing a program which when executed by at least one processing unit configures a machine-trained (MT) network comprising a plurality of configurable weights to train, the program comprising sets of instructions for: propagating a set of inputs through the MT network to generate a set of output probability distributions, each input having a corresponding expected output probability distribution included in a set of expected output probability distributions; calculating a value of a continuously-differentiable loss function comprising a term that approximates an extremum function of a difference between each output probability distribution in the set of output probability distributions and the corresponding expected output probability distribution; and training the plurality of configurable weights by back-propagating the calculated value of the continuously-differentiable loss function. 17 . The non-transitory machine-readable medium of claim 16 , wherein: the MT network receives an input and outputs one of a set of discrete categories for the input; the set of inputs comprises a plurality of inputs for each category included in the set of discrete categories; the set of output probability distributions comprises an output probability distribution for each category of input; and the set of expected output probability distributions comprises an expected output probability distribution for each category of input. 18 . The non-transitory machine-readable medium of claim 17 , wherein the expected output probability distribution for a particular category of input is 1 for the particular category and 0 for each other category. 19 . The non-transitory machine-readable medium of claim 17 , wherein the term comprises a natural logarithm of a summation of a plurality of exponential functions, wherein an index of the summation is the set of discrete categories, wherein an exponent of each exponential function for a particular input category is a function of the output probability distribution for the particular input category. 20 . The non-transitory machine-readable medium of claim 19 , wherein the function of the output probability distribution is an entropy calculation for the output probability distribution. 21 - 22 . (canceled)

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Activation functions · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025068912A1 cover?
Some embodiments provide a method for configuring a machine-trained (MT) network that includes multiple configurable weights to train. The method propagates a set of inputs through the MT network to generate a set of output probability distributions. Each input has a corresponding expected output probability distribution. The method calculates a value of a continuously-differentiable loss funct…
Who is the assignee on this patent?
Perceive Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).