Method for Optimizing Neural Networks
US-2019138896-A1 · May 9, 2019 · US
US11741369B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11741369-B2 |
| Application number | US-202117514701-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 29, 2021 |
| Priority date | Dec 14, 2017 |
| Publication date | Aug 29, 2023 |
| Grant date | Aug 29, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments provide a method for training a machine-trained (MT) network that processes inputs using network parameters. The method propagates a set of input training items through the MT network to generate a set of output values. The set of input training items comprises multiple training items for each of multiple categories. The method identifies multiple training item groupings in the set of input training items. Each grouping includes at least two training items in a first category and at least one training item in a second category. The method calculates a value of a loss function as a summation of individual loss functions for each of the identified training item groupings. The individual loss function for each particular training item grouping is based on the output values for the training items of the grouping. The method trains the network parameters using the calculated loss function value.
Opening claim text (preview).
The invention claimed is: 1. A method for training a machine-trained (MT) network that classifies inputs into categories, the method comprising: propagating a set of input training items through the MT network to generate output vectors for each of the input training items; identifying a triplet of input training items comprising an anchor input training item of a first category, a positive input training item of the first category, and a negative input training item of a second, different category; calculating a value of a loss function for the triplet based on a probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item; and using the calculated loss function value for the triplet to train the MT network, wherein the trained MT network is for embedding into a device to classify input items. 2. The method of claim 1 , wherein the input items are images and the categories comprise different types of objects found in the images. 3. The method of claim 1 , wherein the probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item is based on assumptions that (i) a distribution of output vectors for each of the categories is a normal distribution and (ii) for each of the categories, a variance of the normal distribution of the output vectors for the category is the same as a variance of the normal distribution of the output vectors for the other categories. 4. The method of claim 1 , wherein: the output vectors are vectors in an N-dimensional space; and for each category, the output vectors for input training items of the category are clustered in the N-dimensional space. 5. The method of claim 4 , wherein the loss function is a function of proximity of the output vector for the anchor input training item to the output vectors for the positive and negative input training items in the N-dimensional space. 6. The method of claim 1 , wherein the triplet is a first triplet comprising a first anchor input training item, a first positive input training item, and a first negative input training item, the method further comprising: identifying a second triplet of input training items comprising a second anchor input training item of a third category, a positive input training item of the third category, and a negative input training item of a fourth, different category; calculating a value of the loss function for the second triplet based on a probability that the output vector for the second anchor input training item is classified in the same category as the output vector for the second positive input training item rather than the same category as the output vector for the second negative input training item, wherein using the calculated loss function value for the first triplet to train the MT network comprises using the calculated loss function values for the first and second triplets to train the MT network. 7. The method of claim 6 , wherein the first anchor input training item is also the second negative input training item, wherein the first and fourth categories are the same. 8. The method of claim 6 , wherein the first anchor input training item is also the second positive input training item, wherein the first and third categories are the same. 9. The method of claim 1 further comprising: identifying each triplet in the set of input training items; and calculating values of the loss function for each identified triplet, wherein using the calculated loss function value to train the MT network comprises: summing the calculated loss function values for each of the identified triplets; and using the summed loss function values to train the MT network. 10. The method of claim 1 , wherein using the calculated loss function value for the triplet to train the MT network comprises: backpropagating the calculated loss function value through the MT network to determine, for each of a set of parameters of the MT network, a rate of change in the calculated loss function value relative to a rate of change in the parameter; and modifying each parameter in the set of parameters according to the determined rate of change for the parameter. 11. A non-transitory machine-readable medium storing a program which when executed by at least one processing unit trains a machine-trained (MT) network that classifies inputs into categories, the program comprising sets of instructions for: propagating a set of input training items through the MT network to generate output vectors for each of the input training items; identifying a triplet of input training items comprising an anchor input training item of a first category, a positive input training item of the first category, and a negative input training item of a second, different category; calculating a value of a loss function for the triplet based on a probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item; and using the calculated loss function value for the triplet to train the MT network, wherein the trained MT network is for embedding into a device to classify input items. 12. The non-transitory machine-readable medium of claim 11 , wherein the input items are images and the categories comprise different types of objects found in the images. 13. The non-transitory machine-readable medium of claim 11 , wherein the probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item is based on assumptions that (i) a distribution of output vectors for each of the categories is a normal distribution and (ii) for each of the categories, a variance of the normal distribution of the output vectors for the category is the same as a variance of the normal distribution of the output vectors for the other categories. 14. The non-transitory machine-readable medium of claim 11 , wherein: the output vectors are vectors in an N-dimensional space; and for each category, the output vectors for input training items of the category are clustered in the N-dimensional space. 15. The non-transitory machine-readable medium of claim 14 , wherein the loss function is a function of proximity of the output vector for the anchor input training item to the output vectors for the positive and negative input training items in the N-dimensional space. 16. The non-transitory machine-readable medium of claim 11 , wherein the triplet is a first triplet comprising a first anchor input training item, a first positive input training item, and a first negative input training item, the program further comprising sets of instructions for: identifying a second triplet of input training items comprising a second anchor input training item of a third category, a positive input training item of the third category, and a negative input training item of a fourth, different category; calculating a value of the loss function for the second triplet based on a probability that the output vector for the second anchor input training item is classifie
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.