System and method for controlling multidirectional operation of an elevator
US-2024425322-A1 · Dec 26, 2024 · US
US9390370B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9390370-B2 |
| Application number | US-201313783812-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 4, 2013 |
| Priority date | Aug 28, 2012 |
| Publication date | Jul 12, 2016 |
| Grant date | Jul 12, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for training a neural network includes receiving labeled training data at a master node, generating, by the master node, partitioned training data from the labeled training data and a held-out set of the labeled training data, determining a plurality of gradients for the partitioned training data, wherein the determination of the gradients is distributed across a plurality of worker nodes, determining a plurality of curvature matrix-vector products over the plurality of samples of the partitioned training data, wherein the determination of the plurality of curvature matrix-vector products is distributed across the plurality of worker nodes, and determining, by the master node, a second-order optimization of the plurality of gradients and the plurality of curvature matrix-vector products, producing a trained neural network configured to perform a structured classification task using a sequence-discriminative criterion.
Opening claim text (preview).
What is claimed is: 1. A method for training a neural network, the method comprising: receiving labeled training data at a master node; generating, by the master node, partitioned training data from the labeled training data and a held-out set of the labeled training data; determining a plurality of gradients for the partitioned training data, wherein the determination of the gradients is distributed across a plurality of worker nodes; determining a plurality of curvature matrix-vector products over a plurality of samples of the partitioned training data, wherein the determination of the plurality of curvature matrix-vector products is distributed across the plurality of worker nodes; and determining, by the master node, a second-order optimization of the plurality of gradients and the plurality of curvature matrix-vector products, wherein the second-order optimization forms a plurality of quadratic approximations of a loss function corresponding to the gradients determined by the worker nodes, the plurality of quadratic approximations of the loss function being formed using the curvature matrix-vector products, the second-order optimization selecting, from the plurality of quadratic approximations, a quadratic approximation determined to reduce a loss on the held-out set of the labeled training data, and producing a trained neural network having network parameters corresponding to the quadratic approximation selected, wherein the trained neural network is configured to perform a structured classification task. 2. The method of claim 1 , further comprising assigning, by the master node, the partitioned training data to the plurality of worker nodes. 3. The method of claim 1 , further comprising coordinating, by the master node, activity of the plurality of worker nodes. 4. The method of claim 1 , wherein the second-order optimization comprises a Hessian-free optimization. 5. The method of claim 1 , wherein the trained neural network comprises a plurality of nodes connected by a plurality of edges, wherein the second-order optimization determines weights for the plurality of edges, and wherein the weights are the network parameters. 6. The method of claim 1 , further comprising generating, by the master node, the held-out set of the labeled training data, wherein determining the second-order optimization further comprises the iterative steps of: determining an actual loss for the gradient of the quadratic approximation selected based on the held-out set of the labeled training data; and adjusting a damping parameter according to a comparison of the actual loss to a predicted loss of the quadratic approximation selected, wherein the damping parameter controls the quadratic approximation. 7. The method of claim 1 , wherein the master node and the plurality of worker nodes constitute a computer system configured to produce the trained neural network, wherein the plurality of worker nodes perform data-parallel computations to determine the plurality of gradients and curvature matrix-vector products, and wherein the master node performs a computation to determine the second-order optimization. 8. A computer program product for training a neural network, the computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code readable by a processor to: receive labeled training data; generate partitioned training data from the labeled training data and a held-out set of the labeled training data; assign the partitioned training data to a plurality of worker nodes; receive a plurality of gradients and a plurality of curvature matrix-vector products from the plurality of worker nodes; and determine a second-order optimization of the plurality of gradients and the plurality of curvature matrix-vector products, using the held-out set and a damping parameter determined using the held-out set, producing a trained neural network configured to perform a structured classification task using a sequence-discriminative criterion. 9. The computer program product of claim 8 , wherein the processor coordinates activity of the plurality of worker nodes. 10. The computer program product of claim 8 , wherein the second-order optimization comprises a Hessian-free optimization. 11. The computer program product of claim 8 , wherein the processor generates the held-out set of the labeled training data, and determines the second-order optimization by determining an actual loss for a current gradient of a quadratic approximation of a loss function based on the held-out set of the labeled training data, and adjusting the damping parameter according to a comparison of the actual loss to a predicted loss of the current quadratic approximation of the loss function, wherein the damping parameter controls the quadratic approximation. 12. The computer program product of claim 8 , wherein the trained neural network comprises a plurality of nodes connected by a plurality of edges, wherein the second-order optimization determines weights for the plurality of edges, and wherein the weights are the network parameters. 13. A system for training deep neural network acoustic models comprising: a plurality of distributed worker computing devices configured to perform data-parallel computation of gradients and curvature matrix-vector products for partitioned training data generated from labeled training data; and a master computing device connected to the plurality of distributed worker computing devices by inter-process communication flow, wherein the master computing device is configured to determine a second-order optimization given the gradients and the curvature matrix-vector products and to coordinate activity of the plurality of distributed worker computing devices, wherein the second-order optimization forms a plurality of quadratic approximations of a loss function corresponding to the gradients determined by the distributed worker computing devices, the plurality of quadratic approximations of the loss function being formed using the curvature matrix-vector products, the second-order optimization selecting, from the plurality of quadratic approximations, a quadratic approximation determined to reduce a loss on a held-out set of the labeled training data, and producing a trained neural network having network parameters corresponding to the quadratic approximation selected, wherein the trained neural network is configured to perform a structured classification task. 14. The system of claim 13 , wherein the plurality of distributed worker computing devices are each configured to: receive partitioned training data from the master computing device; determine the gradients for the partitioned training data; and determine the curvature matrix-vector products over the partitioned training data. 15. The system of claim 13 , wherein the master computing device is configured to: receive the labeled training data; generate the partitioned training data from the labeled training data and the held-out set of the labeled training data; assign the partitioned training data to the plurality of distributed worker computing devices; and receive the gradients and the curvature matrix-vector products from the plurality of distributed worker computing devices. 16. The system of claim 13 , wherein the master computing device coordinates activity of the plurality of distributed worker computing devices.
Related publications grouped by family.
Answers are generated from the same data shown on this page.