Using batches of training items for training a network

US11741369B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11741369-B2
Application numberUS-202117514701-A
CountryUS
Kind codeB2
Filing dateOct 29, 2021
Priority dateDec 14, 2017
Publication dateAug 29, 2023
Grant dateAug 29, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments provide a method for training a machine-trained (MT) network that processes inputs using network parameters. The method propagates a set of input training items through the MT network to generate a set of output values. The set of input training items comprises multiple training items for each of multiple categories. The method identifies multiple training item groupings in the set of input training items. Each grouping includes at least two training items in a first category and at least one training item in a second category. The method calculates a value of a loss function as a summation of individual loss functions for each of the identified training item groupings. The individual loss function for each particular training item grouping is based on the output values for the training items of the grouping. The method trains the network parameters using the calculated loss function value.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for training a machine-trained (MT) network that classifies inputs into categories, the method comprising: propagating a set of input training items through the MT network to generate output vectors for each of the input training items; identifying a triplet of input training items comprising an anchor input training item of a first category, a positive input training item of the first category, and a negative input training item of a second, different category; calculating a value of a loss function for the triplet based on a probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item; and using the calculated loss function value for the triplet to train the MT network, wherein the trained MT network is for embedding into a device to classify input items. 2. The method of claim 1 , wherein the input items are images and the categories comprise different types of objects found in the images. 3. The method of claim 1 , wherein the probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item is based on assumptions that (i) a distribution of output vectors for each of the categories is a normal distribution and (ii) for each of the categories, a variance of the normal distribution of the output vectors for the category is the same as a variance of the normal distribution of the output vectors for the other categories. 4. The method of claim 1 , wherein: the output vectors are vectors in an N-dimensional space; and for each category, the output vectors for input training items of the category are clustered in the N-dimensional space. 5. The method of claim 4 , wherein the loss function is a function of proximity of the output vector for the anchor input training item to the output vectors for the positive and negative input training items in the N-dimensional space. 6. The method of claim 1 , wherein the triplet is a first triplet comprising a first anchor input training item, a first positive input training item, and a first negative input training item, the method further comprising: identifying a second triplet of input training items comprising a second anchor input training item of a third category, a positive input training item of the third category, and a negative input training item of a fourth, different category; calculating a value of the loss function for the second triplet based on a probability that the output vector for the second anchor input training item is classified in the same category as the output vector for the second positive input training item rather than the same category as the output vector for the second negative input training item, wherein using the calculated loss function value for the first triplet to train the MT network comprises using the calculated loss function values for the first and second triplets to train the MT network. 7. The method of claim 6 , wherein the first anchor input training item is also the second negative input training item, wherein the first and fourth categories are the same. 8. The method of claim 6 , wherein the first anchor input training item is also the second positive input training item, wherein the first and third categories are the same. 9. The method of claim 1 further comprising: identifying each triplet in the set of input training items; and calculating values of the loss function for each identified triplet, wherein using the calculated loss function value to train the MT network comprises: summing the calculated loss function values for each of the identified triplets; and using the summed loss function values to train the MT network. 10. The method of claim 1 , wherein using the calculated loss function value for the triplet to train the MT network comprises: backpropagating the calculated loss function value through the MT network to determine, for each of a set of parameters of the MT network, a rate of change in the calculated loss function value relative to a rate of change in the parameter; and modifying each parameter in the set of parameters according to the determined rate of change for the parameter. 11. A non-transitory machine-readable medium storing a program which when executed by at least one processing unit trains a machine-trained (MT) network that classifies inputs into categories, the program comprising sets of instructions for: propagating a set of input training items through the MT network to generate output vectors for each of the input training items; identifying a triplet of input training items comprising an anchor input training item of a first category, a positive input training item of the first category, and a negative input training item of a second, different category; calculating a value of a loss function for the triplet based on a probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item; and using the calculated loss function value for the triplet to train the MT network, wherein the trained MT network is for embedding into a device to classify input items. 12. The non-transitory machine-readable medium of claim 11 , wherein the input items are images and the categories comprise different types of objects found in the images. 13. The non-transitory machine-readable medium of claim 11 , wherein the probability that the output vector for the anchor input training item is classified in the same category as the output vector for the positive input training item rather than the same category as the output vector for the negative input training item is based on assumptions that (i) a distribution of output vectors for each of the categories is a normal distribution and (ii) for each of the categories, a variance of the normal distribution of the output vectors for the category is the same as a variance of the normal distribution of the output vectors for the other categories. 14. The non-transitory machine-readable medium of claim 11 , wherein: the output vectors are vectors in an N-dimensional space; and for each category, the output vectors for input training items of the category are clustered in the N-dimensional space. 15. The non-transitory machine-readable medium of claim 14 , wherein the loss function is a function of proximity of the output vector for the anchor input training item to the output vectors for the positive and negative input training items in the N-dimensional space. 16. The non-transitory machine-readable medium of claim 11 , wherein the triplet is a first triplet comprising a first anchor input training item, a first positive input training item, and a first negative input training item, the program further comprising sets of instructions for: identifying a second triplet of input training items comprising a second anchor input training item of a third category, a positive input training item of the third category, and a negative input training item of a fourth, different category; calculating a value of the loss function for the second triplet based on a probability that the output vector for the second anchor input training item is classifie

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11741369B2 cover?
Some embodiments provide a method for training a machine-trained (MT) network that processes inputs using network parameters. The method propagates a set of input training items through the MT network to generate a set of output values. The set of input training items comprises multiple training items for each of multiple categories. The method identifies multiple training item groupings in the…
Who is the assignee on this patent?
Perceive Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 29 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).