Tool for facilitating efficiency in machine learning

US11410024B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11410024-B2
Application numberUS-201715581152-A
CountryUS
Kind codeB2
Filing dateApr 28, 2017
Priority dateApr 28, 2017
Publication dateAug 9, 2022
Grant dateAug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a graphics processor to: detect one or more sets of data from one or more sources over one or more networks; cause a neural network application to implement a library comprising machine learning primitives, wherein the machine learning primitives are usable to analyze a skew pattern observed in a distributed gradient synchronization implemented by the neural network application; implement, using the neural network application, the distributed gradient synchronization using a tree structure such that local weight vectors start at one or more nodes represented as leaves of the tree structure and communicate up to a root of the tree structure; determine, using the machine learning primitives of the library as implemented by the neural network application, a point to apply frequency scaling in the graphics processor that does not degrade performance of the neural network application, the point determined based on analysis of the skew pattern generated by the distributed gradient synchronization implemented via the tree structure; and determine, using the library as implemented by the neural network application, a core frequency of the frequency scaling applied at the point, wherein the library is to account for skew characteristics associated with the distributed gradient synchronization to decide the core frequency. 2. The apparatus of claim 1 , wherein the skew characteristics comprise at least the skew pattern. 3. The apparatus of claim 1 , wherein the graphics processor is further operable to introduce a sparse matrix representation for weights to overlap communication and computation across the one or more nodes associated with the neural network application to reduce communication costs. 4. The apparatus of claim 1 , wherein the graphics processor is further to automatically analyze failed execution of programs including or relevant to the neural network application to obtain insights on one or more faults of hardware performance counters. 5. The apparatus of claim 4 , wherein the graphics processor is further to provide one or more of successful execution information obtained from successful execution of programs and failed execution information obtained from failed execution of programs to a trained network model to seek out one or more of the hardware performance counters that are regarded as faulty or outside a range of approval. 6. The apparatus of claim 1 , wherein the graphics processor is further to perform local error propagation by computing high precision and low precision for local weights and compute local errors at each of the one or more nodes, wherein performing the local error propagation further comprises facilitating weight synchronization across the one or more nodes to track the local errors for accuracy and reduced communication. 7. The apparatus of claim 1 , wherein the apparatus comprises an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package. 8. A method comprising: detecting, by a graphics processor, one or more sets of data from one or more sources over one or more networks; causing, by the graphics processor, a neural network application to implement a library comprising machine learning primitives, wherein the machine learning primitives are usable to analyze a skew pattern observed in a distributed gradient synchronization implemented by the neural network application; implementing, using the neural network application, the distributed gradient synchronization using a tree structure such that local weight vectors start at one or more nodes represented as leaves of the tree structure and communicate up to a root of the tree structure; determining, using the machine learning primitives of the library as implemented by the neural network application, a point to apply frequency scaling in the graphics processor that does not degrade performance of the neural network application, the point determined based on analysis of the skew pattern generated by the distributed gradient synchronization implemented via the tree structure; and determining, using the library as implemented by the neural network application, a core frequency of the frequency scaling applied at the point, wherein the library is to account for skew characteristics associated with the distributed gradient synchronization to decide the core frequency. 9. The method of claim 8 , wherein the skew characteristics comprise at least the skew pattern. 10. The method of claim 8 , further comprising introducing sparse matrix representation for weights to overlap communication and computation across the one or more nodes associated with the neural network application to reduce communication costs. 11. The method of claim 8 , further comprising automatically analyzing failed execution of programs including or relevant to the neural network application to obtain insights on one or more faults of hardware performance counters. 12. The method of claim 11 , further comprising providing one or more of successful execution information obtained from successful execution of programs and failed execution information obtained from failed execution of programs to a trained network model to seek out one or more of the hardware performance counters that are regarded as faulty or outside a range of approval. 13. The method of claim 8 , further comprising performing local error propagation by computing high precision and low precision for local weights and compute local errors at each of the one or more nodes, wherein performing the local error propagation further comprises facilitating weight synchronization across the one or more nodes to track the local errors for accuracy and reduced communication. 14. The method of claim 8 , wherein the graphics processor is part of an autonomous machine comprising one or more of a vehicle, a device, or an equipment, wherein the autonomous machine comprises one or more processors including the graphics processor, wherein the graphics processor is co-located with an application processor on a common semiconductor package. 15. At least one non-transitory machine-readable medium comprising instructions that when executed by a local computing device, cause the local computing device to perform operations comprising: detecting, by a graphics processor of the local computing device, one or more sets of data from one or more sources over one or more networks; causing, by the graphics processor, a neural network application to implement a library comprising machine learning primitives, wherein the machine learning primitives are usable to analyze a skew pattern observed in a distributed gradient synchronization implemented by the neural network application; implementing, using the neural network application, the distributed gradient synchronization using a tree structure such that local weight vectors start at one or more nodes represented as leaves of the tree structure and communicate up to a root of the tree structure; determining, using the machine learning primitives of the library as implemented by the neural network application, a point to apply frequency scaling in the graphics processor that does not degrade performance of the neural network application, the point determined based on analysis of the skew pattern generated by the distributed gradient synchronization implemented via the tree structure; and determining, using the li

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11410024B2 cover?
A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading per…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).