Dynamic loss function based on statistics in loss layer of deep convolutional neural network

US2017300811A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017300811-A1
Application numberUS-201615099077-A
CountryUS
Kind codeA1
Filing dateApr 14, 2016
Priority dateApr 14, 2016
Publication dateOct 19, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example embodiment, an a loss layer of a deep convolutional neural network is modified to include a dynamically changing function that adjusts based on statistical analysis of the samples, and specifically an analysis of which sample images showed the most deviation between their assigned professionalism score and an expected professionalism score. This allows outliers in training data to be automatically handled.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computerized method of training a deep convolutional neural network (DCNN), the method comprising: training the DCNN by: inputting a current plurality of samples to the DCNN, each of the samples having a label, the inputting including, for each sample: passing the sample to a convolutional layer of the DCNN, the convolutional layer comprising one or more filters having dynamically adjustable weights, the one or more filters configured to filter the sample to produce an output volume for the corresponding sample, the output volume comprising a different feature map for each of the one or more filters; passing the output volume from the convolutional layer through a nonlinearity layer, the nonlinearity layer applying a nonlinearity function to the output volume from the convolutional layer; passing the output volume from the nonlinearity layer through a pooling layer, the pooling layer lowering spatial dimensions of the output volume from the nonlinearity layer; passing the output volume from the pooling layer through a classification layer, the classification layer comprising a specialized convolutional layer having a filter designed to output a prediction for the sample based on the output volume from the pooling layer; passing the sample through a loss layer, the loss layer applying a loss function to the sample, resulting an in indication of a level of error in the prediction from the classification layer in comparison to the label of the sample; ranking each of the current plurality of samples based on their corresponding levels of error; applying a dynamic loss function to the current plurality of samples to eliminate lower ranked samples from consideration; determining whether a combination of the levels of error for the current plurality of samples not eliminated from consideration by the dynamic loss function transgresses a preset threshold; and in response to a determination that the combination of the levels of error transgresses a preset threshold, updating weights of the one or more filters in the convolutional layers of the DCNN to reduce the combination of the levels of error and repeating the training of the DCNN using a different plurality of samples and the updated weights. 2 . The method of claim 1 , wherein the DCNN comprises multiple stages, each stage containing a different convolutional layer, nonlinearity layer, and pooling layer. 3 . The method of claim 1 , wherein the dynamic loss function is based on statistics regarding the current plurality of samples. 4 . The method of claim 1 , wherein the dynamic loss function is based on statistics regarding the current plurality of samples and an additional plurality of samples previously used to train the DCNN, yet is applied just to the current plurality of samples. 5 . The method of claim 1 , wherein the dynamic loss function is designed to automatically become stricter as more iterations of the training occur. 6 . The method of claim 1 , wherein the dynamic loss function eliminates a preset percentage of samples from consideration. 7 . The method of claim 1 , wherein the dynamic loss function eliminates samples from consideration by weighting each samples probability of being an outlier assuming a Gaussian distribution of errors. 8 . A system for training a deep convolutional neural (DCNN), the system a computer readable medium having instructions stored there on, which, when executed by a processor, cause the system to: train the DCNN by: inputting a current plurality of samples to the DCNN, each of the samples having a label, the inputting including, for each sample: passing the sample to a convolutional layer of the DCNN, the convolutional layer comprising one or more filters having dynamically adjustable weights, the one or more filters configured to filter the sample to produce an output volume for the corresponding sample, the output volume comprising a different feature map for each of the one or more filters; passing the output volume from the convolutional layer through a nonlinearity layer, the nonlinearity layer applying a nonlinearity function to the output volume from the convolutional layer; passing the output volume from the nonlinearity layer through a pooling layer, the pooling layer lowering spatial dimensions of the output volume from the nonlinearity layer; passing the output volume from the pooling layer through a classification layer, the classification layer comprising a specialized convolutional layer having a filter designed to output a prediction for the sample based on the output volume from the pooling layer; passing the sample through a loss layer, the loss layer applying a loss function to the sample, resulting an in indication of a level of error in the prediction from the classification layer in comparison to the label of the sample; ranking each of the current plurality of samples based on their corresponding levels of error; applying a dynamic loss function to the current plurality of samples to eliminate lower ranked samples from consideration; determining whether a combination of the levels of error for the current plurality of samples not eliminated from consideration by the dynamic loss function transgresses a preset threshold; and in response to a determination that the combination of the levels of error transgresses a preset threshold, updating weights of the one or more filters in the convolutional layers of the DCNN to reduce the combination of the levels of error and repeating the training of the DCNN using a different plurality of samples and the updated weights. 9 . The system of claim 8 , wherein the DCNN comprises multiple stages, each stage containing a different convolutional layer, nonlinearity layer, and pooling layer. 10 . The system of claim 8 , wherein the dynamic loss function is based on statistics regarding the current plurality of samples. 11 . The system of claim 8 , wherein the dynamic loss function is based on statistics regarding the current plurality of samples and an additional plurality of samples previously used to train the DCNN, yet is applied just to the current plurality of samples. 12 . The system of claim 8 , wherein the dynamic loss function is designed to automatically become stricter as more iterations of the training occur. 13 . The system of claim 8 , wherein the dynamic loss function eliminates a preset percentage of samples from consideration. 14 . The system of claim 8 , wherein the dynamic loss function eliminates samples from consideration by weighting each samples probability of being an outlier assuming a Gaussian distribution of errors. 15 . A non-transitory machine-readable storage medium comprising instructions, which when implemented by one or more machines, cause the one or more machines to perform operations for training a deep convolutional neural network (DCNN), the operations comprising: training the DCNN by: inputting a current plurality of samples to the DCNN, each of the samples having a label, the inputting including, for each sample: passing the sample to a convolutional layer of the DCNN, the convolutional layer comprising one or more filters having dynamically adjustable weights, the one or more filters configured to filter the sample to produce an output volume for the corresponding sample, the output volume comprising a different feature map for each of the one or more filters; passing the output volume from the convolutional layer through a nonlinearity layer, the nonlinearity layer applying a nonlinearity function to the output volume from the convolutional layer; p

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017300811A1 cover?
In an example embodiment, an a loss layer of a deep convolutional neural network is modified to include a dynamically changing function that adjusts based on statistical analysis of the samples, and specifically an analysis of which sample images showed the most deviation between their assigned professionalism score and an expected professionalism score. This allows outliers in training data to…
Who is the assignee on this patent?
Linkedin Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).