What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Information processing apparatus and information processing method

US2020372336A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020372336-A1
Application number	US-202015931629-A
Country	US
Kind code	A1
Filing date	May 14, 2020
Priority date	May 23, 2019
Publication date	Nov 26, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Each of a plurality of processors enters, to a model representing a neural network and including a common first weight, first data different from that used by the other processors, calculates an error gradient for the first weight, and integrates the gradients calculated by each processor. Each processor stores the first weight in a memory and updates the weight of the model to a second weight based on a hyperparameter value different from those used by the other processors, the integrated error gradient, and the first weight. Each processor enters common second data to the model, compares the evaluation results acquired by each processor, and selects a common hyperparameter value. Each processor updates the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the memory.

First claim

Opening claim text (preview).

What is claimed is: 1 . An information processing apparatus comprising: a plurality of processors; and a plurality of memories corresponding to the plurality of processors, wherein each of the plurality of processors is configured to execute a process including: entering, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculating an error gradient with respect to the first weight based on an output of the model, and integrating the error gradient and other error gradients calculated by the other processors to obtain an integrated error gradient, storing the first weight in a corresponding memory among the plurality of memories and updating a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, entering second data common among the plurality of processors to the model, evaluating accuracy of an output of the model, comparing an evaluation result of the accuracy with evaluation results acquired by the other processors, and selecting a hyperparameter value common among the plurality of processors, and updating the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory. 2 . The information processing apparatus according to claim 1 , wherein, among a plurality of hyperparameter values corresponding to the plurality of processors, a hyperparameter value corresponding to a processor that has achieved a highest output accuracy is selected as the hyperparameter value common among the plurality of processors. 3 . The information processing apparatus according to claim 1 , wherein the hyperparameter value different from the hyperparameter values used by the other processors is generated by applying an adjustment coefficient different from adjustment coefficients used by the other processors to a hyperparameter basic value common among the plurality of processors. 4 . The information processing apparatus according to claim 1 , wherein identification information is assigned to a process executed by each of the plurality of processors, the hyperparameter value different from the hyperparameter values used by the other processors is determined from identification information assigned to a corresponding process, and one item of identification information common among the plurality of processors is selected based on the comparing of the evaluation result of the accuracy with the evaluation results acquired by the other processors, and the hyperparameter value common among the plurality of processors is determined from the selected one item of identification information. 5 . The information processing apparatus according to claim 1 , wherein the model includes a plurality of first weights, and calculation of error gradients with respect to first weights for which error gradients have not been calculated among the plurality of first weights and transfer of error gradients that have been calculated among the plurality of processors are performed in a parallel manner. 6 . An information processing method comprising: entering, by each of a plurality of processors of a computer, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculating an error gradient with respect to the first weight based on an output of the model, and integrating the error gradients calculated by the plurality of processors to obtain an integrated error gradient, storing, by each of the plurality of processors, the first weight in a corresponding memory and updating a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, entering, by each of the plurality of processors, second data common among the plurality of processors to the model, evaluating accuracy of an output of the model, comparing results of the evaluating performed by the plurality of processors, and selecting a hyperparameter value common among the plurality of processors, and updating, by each of the plurality of processors, the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory. 7 . A non-transitory computer-readable recording medium storing therein a computer program that causes a computer including a plurality of processors to execute a process comprising: causing each of the plurality of processors to enter, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculate an error gradient with respect to the first weight based on an output of the model, and integrate the error gradients calculated by the plurality of processors to obtain an integrated error gradient, causing each of the plurality of processors to store the first weight in a corresponding memory and update a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, causing each of the plurality of processors to enter second data common among the plurality of processors to the model, evaluate accuracy of an output of the model, compare results of the evaluations performed by the plurality of processors, and select a hyperparameter value common among the plurality of processors, and causing each of the plurality of processors to update the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory.

Assignees

Fujitsu Ltd

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 70480143

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020372336A1 cover?: Each of a plurality of processors enters, to a model representing a neural network and including a common first weight, first data different from that used by the other processors, calculates an error gradient for the first weight, and integrates the gradients calculated by each processor. Each processor stores the first weight in a memory and updates the weight of the model to a second weight …
Who is the assignee on this patent?: Fujitsu Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).