Neural network computation circuit, control circuit therefor, and control method therefor
US-2024411520-A1 · Dec 12, 2024 · US
US2020372336A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020372336-A1 |
| Application number | US-202015931629-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 14, 2020 |
| Priority date | May 23, 2019 |
| Publication date | Nov 26, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Each of a plurality of processors enters, to a model representing a neural network and including a common first weight, first data different from that used by the other processors, calculates an error gradient for the first weight, and integrates the gradients calculated by each processor. Each processor stores the first weight in a memory and updates the weight of the model to a second weight based on a hyperparameter value different from those used by the other processors, the integrated error gradient, and the first weight. Each processor enters common second data to the model, compares the evaluation results acquired by each processor, and selects a common hyperparameter value. Each processor updates the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the memory.
Opening claim text (preview).
What is claimed is: 1 . An information processing apparatus comprising: a plurality of processors; and a plurality of memories corresponding to the plurality of processors, wherein each of the plurality of processors is configured to execute a process including: entering, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculating an error gradient with respect to the first weight based on an output of the model, and integrating the error gradient and other error gradients calculated by the other processors to obtain an integrated error gradient, storing the first weight in a corresponding memory among the plurality of memories and updating a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, entering second data common among the plurality of processors to the model, evaluating accuracy of an output of the model, comparing an evaluation result of the accuracy with evaluation results acquired by the other processors, and selecting a hyperparameter value common among the plurality of processors, and updating the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory. 2 . The information processing apparatus according to claim 1 , wherein, among a plurality of hyperparameter values corresponding to the plurality of processors, a hyperparameter value corresponding to a processor that has achieved a highest output accuracy is selected as the hyperparameter value common among the plurality of processors. 3 . The information processing apparatus according to claim 1 , wherein the hyperparameter value different from the hyperparameter values used by the other processors is generated by applying an adjustment coefficient different from adjustment coefficients used by the other processors to a hyperparameter basic value common among the plurality of processors. 4 . The information processing apparatus according to claim 1 , wherein identification information is assigned to a process executed by each of the plurality of processors, the hyperparameter value different from the hyperparameter values used by the other processors is determined from identification information assigned to a corresponding process, and one item of identification information common among the plurality of processors is selected based on the comparing of the evaluation result of the accuracy with the evaluation results acquired by the other processors, and the hyperparameter value common among the plurality of processors is determined from the selected one item of identification information. 5 . The information processing apparatus according to claim 1 , wherein the model includes a plurality of first weights, and calculation of error gradients with respect to first weights for which error gradients have not been calculated among the plurality of first weights and transfer of error gradients that have been calculated among the plurality of processors are performed in a parallel manner. 6 . An information processing method comprising: entering, by each of a plurality of processors of a computer, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculating an error gradient with respect to the first weight based on an output of the model, and integrating the error gradients calculated by the plurality of processors to obtain an integrated error gradient, storing, by each of the plurality of processors, the first weight in a corresponding memory and updating a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, entering, by each of the plurality of processors, second data common among the plurality of processors to the model, evaluating accuracy of an output of the model, comparing results of the evaluating performed by the plurality of processors, and selecting a hyperparameter value common among the plurality of processors, and updating, by each of the plurality of processors, the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory. 7 . A non-transitory computer-readable recording medium storing therein a computer program that causes a computer including a plurality of processors to execute a process comprising: causing each of the plurality of processors to enter, to a model which represents a neural network and which includes a first weight common among the plurality of processors, first data different from first data used by other processors, calculate an error gradient with respect to the first weight based on an output of the model, and integrate the error gradients calculated by the plurality of processors to obtain an integrated error gradient, causing each of the plurality of processors to store the first weight in a corresponding memory and update a weight of the model to a second weight based on a hyperparameter value different from hyperparameter values used by the other processors, the integrated error gradient, and the first weight, causing each of the plurality of processors to enter second data common among the plurality of processors to the model, evaluate accuracy of an output of the model, compare results of the evaluations performed by the plurality of processors, and select a hyperparameter value common among the plurality of processors, and causing each of the plurality of processors to update the weight of the model to a third weight based on the selected hyperparameter value, the integrated error gradient, and the first weight stored in the corresponding memory.
Combinations of networks · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Distributed learning, e.g. federated learning · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.