Training network to minimize worst-case error

US12051000B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12051000-B2
Application numberUS-202217962789-A
CountryUS
Kind codeB2
Filing dateOct 10, 2022
Priority dateNov 29, 2016
Publication dateJul 30, 2024
Grant dateJul 30, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments provide a method for configuring a machine-trained (MT) network that includes multiple configurable weights to train. The method propagates a set of inputs through the MT network to generate a set of output probability distributions. Each input has a corresponding expected output probability distribution. The method calculates a value of a continuously-differentiable loss function that includes a term approximating an extremum function of the difference between the expected output probability distributions and generated set of output probability distributions. The method trains the weights by back-propagating the calculated value of the continuously-differentiable loss function.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a classification network that classifies inputs into a plurality of different categories, the method comprising: propagating a set of inputs through the classification network to generate a set of output probability distributions, the generated output probability distribution for each input providing a probability of the input belonging to each of the categories, each input having a corresponding expected output probability distribution that specifies a particular one of the categories to which the input belongs; calculating a value of a continuously-differentiable loss function comprising a term that approximates a maximum of entropy calculations for each of the different categories; and using the calculated continuously-differentiable loss function value to train weights of the classification network, wherein the term that approximates the maximum of the entropy calculations biases the training of the weights towards reducing a difference between the expected output probability distributions and the generated output probability distributions for inputs belonging to a category with the largest entropy calculations. 2. The method of claim 1 , wherein calculating the value of the continuously-differentiable loss function comprises calculating the entropy for each of the different categories. 3. The method of claim 2 , wherein calculating the entropy for each of the different categories comprises, for each of the categories: calculating an average of the generated output probability distributions for the inputs belonging to the category; and calculating the entropy of the average of the generated output probability distributions for the inputs belonging to the category. 4. The method of claim 2 , wherein calculating the entropy for each of the different categories comprises using a log-sum-exponent formulation that highlights inputs with the largest divergence between expected output probability distributions and generated output probability distributions. 5. The method of claim 4 , wherein the term that approximates the maximum of the entropy calculations is a log-sum-exponent term that uses the log-sum-exponent formulation of the entropy as its exponent. 6. The method of claim 5 , wherein: the summation in the log-sum-exponent term is a summation over the plurality of different categories; and the summation in the log-sum-exponent formulation of the entropy calculation for a particular category is a summation over the inputs belonging to the particular category. 7. The method of claim 1 , wherein: the set of inputs comprises a plurality of inputs for each of the categories; and for each category, the expected output probability distribution for each input belonging to the category is 1 for the category to which the input belongs and 0 for each other category. 8. The method of claim 1 , wherein using the calculated continuously-differentiable loss function value to train the weights of the classification network comprises: back-propagating the calculated loss function value to determine, for each of a plurality of the weights of the classification network, a rate of change in the calculated loss function value relative to a rate of change in the weight; and modifying each respective weight of the plurality of weights according to the respective rate of change determined for the weight. 9. The method of claim 1 , wherein the classification network is for embedding into a device after the classification network is trained. 10. The method of claim 1 , wherein the inputs are images and the plurality of categories are different types of objects represented in the images. 11. A non-transitory machine-readable medium storing a program which when executed by at least one processing unit trains a classification network that classifies inputs into a plurality of different categories, the program comprising sets of instructions for: propagating a set of inputs through the classification network to generate a set of output probability distributions, the generated output probability distribution for each input providing a probability of the input belonging to each of the categories, each input having a corresponding expected output probability distribution that specifies a particular one of the categories to which the input belongs; calculating a value of a continuously-differentiable loss function comprising a term that approximates a maximum of entropy calculations for each of the different categories; and using the calculated continuously-differentiable loss function value to train weights of the classification network, wherein the term that approximates the maximum of the entropy calculations biases the training of the weights towards reducing a difference between the expected output probability distributions and the generated output probability distributions for inputs belonging to a category with the largest entropy calculations. 12. The non-transitory machine-readable medium of claim 11 , wherein the set of instructions for calculating the value of the continuously-differentiable loss function comprises a set of instructions for calculating the entropy for each of the different categories. 13. The non-transitory machine-readable medium of claim 12 , wherein the set of instructions for calculating the entropy for each of the different categories comprises sets of instructions for, for each of the categories: calculating an average of the generated output probability distributions for the inputs belonging to the category; and calculating the entropy of the average of the generated output probability distributions for the inputs belonging to the category. 14. The non-transitory machine-readable medium of claim 12 , wherein the set of instructions for calculating the entropy for each of the different categories comprises a set of instructions for using a log-sum-exponent formulation that highlights inputs with the largest divergence between expected output probability distributions and generated output probability distributions. 15. The non-transitory machine-readable medium of claim 14 , wherein the term that approximates the maximum of the entropy calculations is a log-sum-exponent term that uses the log-sum-exponent formulation of the entropy as its exponent. 16. The non-transitory machine-readable medium of claim 15 , wherein: the summation in the log-sum-exponent term is a summation over the plurality of different categories; and the summation in the log-sum-exponent formulation of the entropy calculation for a particular category is a summation over the inputs belonging to the particular category. 17. The non-transitory machine-readable medium of claim 11 , wherein: the set of inputs comprises a plurality of inputs for each of the categories; and for each category, the expected output probability distribution for each input belonging to the category is 1 for the category to which the input belongs and 0 for each other category. 18. The non-transitory machine-readable medium of claim 11 , wherein the set of instructions for using the calculated continuously-differentiable loss function value to train the weights of the classification network comprises sets of instructions for: back-propagating the calculated loss function value to determine, for each of a plurality of the weights of the classification network, a rate of change in the calculated loss function value relative to a rate of change in the weight; and modifying each respective weight of the plurality of weights according to the respecti

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Activation functions · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12051000B2 cover?
Some embodiments provide a method for configuring a machine-trained (MT) network that includes multiple configurable weights to train. The method propagates a set of inputs through the MT network to generate a set of output probability distributions. Each input has a corresponding expected output probability distribution. The method calculates a value of a continuously-differentiable loss funct…
Who is the assignee on this patent?
Perceive Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).