Systems and methods for automatically generating code for deep learning systems

US10157045B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10157045-B2
Application numberUS-201715816606-A
CountryUS
Kind codeB2
Filing dateNov 17, 2017
Priority dateNov 17, 2016
Publication dateDec 18, 2018
Grant dateDec 18, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: for a source program that implements a deep learning network, the deep learning network including network layers and network parameters, storing, in a memory, a framework that includes a base layer class type that defines at least one of a setup function, a predict function, or a cleanup function, a plurality of subclass types of a class hierarchy that inherit from the base layer class type, the plurality of subclass types representing abstractions of functionality performed by deep learning network layer types, where the plurality of subclass types are independent of any particular processor architecture, and an interface layer that interfaces the class hierarchy to sets of predefined deep learning functions, where the sets of predefined deep learning functions are configured for execution at target hardware platforms; generating code, by a processor coupled to the memory, for executing the source program on a target platform, the generating including: generating, by the processor, one or more in-memory intermediate representations (IRs) for the source program; mapping a group of the network layers of the deep learning network to respective ones of the plurality of subclass types; adding to the one or more IRs statements to instantiate objects for the respective ones of the plurality of subclass types that map to the group of network layers of the deep learning network, and first calls to perform the at least one of the setup function, the predict function, or the cleanup function on the objects, utilizing the one or more IRs to produce the code, where the code includes the objects instantiated in response to the statements and second calls from the objects instantiated in response to the statements to a selected one of the sets of predefined deep learning functions via the interface layer; and linking the selected one of the sets of predefined deep learning functions to the generated code. 2. The method of claim 1 wherein the plurality of subclass types of the class hierarchy includes at least one of: an input layer subclass type; a convolution layer subclass type; an activation function layer subclass type; a regularization layer subclass type; a pooling layer subclass type; a fully connected layer subclass type; a classification output subclass type; or a regression output subclass type. 3. The method of claim 2 wherein the activation function layer subclass type is at least one of: a Rectified Linear Unit (ReLU) layer subclass type; a Linear layer subclass type; a Sigmoid layer subclass type; a Tansig layer subclass type; a Tanh layer subclass type; a leaky ReLU layer subclass type; or a clipped ReLU layer subclass type. 4. The method of claim 1 wherein a first IR of the one or more IRs includes nodes that correspond to the network layers of the deep learning network, the method further comprising: determining an execution schedule by analyzing the first IR; identifying two of the nodes of the first IR whose corresponding network layers can share a memory buffer; and modifying the first IR or a second IR of the one or more IRs to share the memory buffer between the network layers that correspond to the two nodes. 5. The method of claim 1 wherein the target platform includes execution units, and a first IR of the one or more IRs includes nodes that correspond to the network layers of the deep learning network, the method further comprising: creating a dependency graph having elements that represent the nodes of the first IR; applying a partitioning algorithm to the dependency graph to organize the nodes of the first IR into dense connection structures, wherein the dense connection structures are associated with respective ones of the execution units of the target platform; and assigning the nodes of the first IR to the execution units of the target platform based on the dense connection structures. 6. The method of claim 5 wherein the partitioning algorithm is a clique partitioning algorithm, and the dense connection structures are cliques. 7. The method of claim 5 wherein the execution units are asynchronous Compute Unified Device Architecture (CUDA) streams of a Graphics Processing Unit (GPU) or cores of a multicore Central Processing Unit (CPU). 8. The method of claim 1 wherein the sets of predefined deep learning functions, include at least one of: CUDA Basic Linear Algebra Subprograms (cuBLAS); CUDA Deep Neural Network (cuDNN); Math Kernel Library for Deep Neural Networks (MKL-DNN); or ARM Compute library. 9. The method of claim 1 further comprising: assigning compile time conditions to the class hierarchy, where the compile time conditionals indicate a data characteristic for the generated code; and implementing the data characteristic in the generated code. 10. The method of claim 9 wherein the data characteristic is a data type or a data arrangement. 11. The method of claim 10 wherein the data type is one of double precision floating point, single precision floating point, half precision floating point, or fixed point, and the data arrangement is row major or column major. 12. The method of claim 1 further comprising: producing an executable from the generated code; and deploying the executable on the target platform to implement the deep learning network. 13. The method of claim 1 further comprising: adding a new subclass type to the class hierarchy, where the new subclass type provides an abstraction of custom functionality for implementing one or more of the deep learning network layer types. 14. The method of claim 1 further comprising: adding a new subclass type to the class hierarchy, where the new subclass type provides an abstraction of functionality for implementing a new deep learning network layer type. 15. The method of claim 1 further comprising: revising the class hierarchy to implement new versions of the plurality of subclass types for a new target hardware platform. 16. The method of claim 1 wherein the framework is structured as an object-oriented class hierarchy. 17. The method of claim 1 further comprising: separating the network parameters from the one or more IRs for the source program; storing the network parameters in one or more data structures; and incorporating the one or more data structures storing the network parameters into the generated code. 18. The method of claim 1 further comprising: importing the deep learning network from a first format into a second format that is compatible with the framework. 19. One or more non-transitory computer-readable media, having stored thereon instructions that when executed by a computing device, cause the computing device to perform operations comprising: for a source program that implements a deep learning network, the deep learning network including network layers and network parameters, storing, in a memory, a framework that includes a base layer class type that defines at least one of a setup function, a predict function, or a cleanup function, a plurality of subclass types of a class hierarchy that inherit from the base layer class type, the plurality of subclass types representing abstractions of functionality performed by deep learning network layer types, where the plurality of subclass types are independent of any particular processor architecture, and an interface layer that interfaces the class hierarchy to sets of predefined deep learning functions, where the sets of predefined deep learning fu

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Shells for specifying net layout · CPC title

  • Activation functions · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • G06F8/35Primary

    model driven · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10157045B2 cover?
Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of …
Who is the assignee on this patent?
Mathworks Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 18 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).