Deep learning model scheduling

US11526728B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11526728-B2
Application numberUS-201816018784-A
CountryUS
Kind codeB2
Filing dateJun 26, 2018
Priority dateApr 9, 2018
Publication dateDec 13, 2022
Grant dateDec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-executable instructions for determining a computation schedule for a recurrent neural network (RNN). A matrix multiplication (MM) directed-acyclic graph (DAG) is received for the RNN. Valid phased computation schedules for the RNN are generated. Each of the valid phase computation schedule includes an ordering of MM operations. For each of the plurality of valid phased computation schedules, each of the MM operations is partitioned to processor cores based on L3 cache to L2 cache data movement. The RNN is executed based on the valid phased computation schedules. A final computation schedule is stored. The final computation schedule is used for future executions of the RNN.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for determining a computation schedule for a recurrent neural network (RNN), the method comprising: receiving a matrix multiplication (MM) directed-acyclic graph (DAG) data structure constructed to model computations of the RNN, the DAG having nodes and edges, with each node representing an MM and each edge representing a dependency between the MMs; generating a plurality of valid phased computation schedules for the RNN from the MM-DAG, wherein each of the valid phase computation schedules includes an ordering of MM operations, and wherein the plurality of valid phased computation schedules have been pruned based on time dependencies to retain valid phased computation schedules that increase reuse of weights; partitioning, for each of the plurality of valid phased computation schedules, each of the MM operations to a plurality of processor cores based on L3 cache to L2 cache data movement to increase reuse of weights by minimizing data movement between caches; executing, for each of the plurality of valid phased computation schedules, the RNN based on the partitioning; and storing a final computation schedule based on the executing, wherein the final computation schedule is used for subsequent executions of the RNN, and wherein the plurality of valid phased computation schedules comprises the final computation schedule. 2. The method of claim 1 , wherein generating a plurality of valid phased computation schedules for the RNN comprises generating schedules with time-independent phases before time-dependent phases. 3. The method of claim 1 , wherein the partitioning further comprises mapping a partition of an MM operation to a single processor core, wherein a weight matrix is reused over a sequence of MM operations, wherein the partition of the MM operation is part of the sequence of MM operations, and wherein a part of the weight matrix is stored in an L2 cache of the single processor core. 4. The method of claim 3 , further comprising: determining two MM operations in a phase have a shared input matrix; and fusing the two MM operations into a single MM operation. 5. The method of claim 4 , further comprising determining a plurality of parallelism degrees for multiple MM operations in a first phase for a first phased computation schedule, wherein the first phase computation schedule is executed with each of the plurality of parallelism levels. 6. The method of claim 5 , wherein a selected degree of parallelism is less than the number of the plurality of processor cores. 7. The method of claim 1 , wherein the partitioning minimizes total data movement from an L3 cache to an L2 cache of a processor core for the computations of the RNN. 8. The method of claim 1 , further comprising: receiving a request to execute the RNN; and executing the RNN with the final computation schedule. 9. The method of claim 1 , further comprising determining the fastest executing valid phase computation schedule based on the executing, wherein the fastest executing valid phase computation schedule is the final computation schedule. 10. A system for determining a computation schedule for a recurrent neural network (RNN), the system comprising: an electronic processor configured: receive a matrix multiplication (MM) directed-acyclic graph (DAG) data structure constructed to model computations of the RNN, the DAG having nodes and edges, with each node representing an MM and each edge representing a dependency between the MMs; generate a plurality of valid phased computation schedules for the RNN from the MM-DAG, wherein each of the valid phase computation schedules includes an ordering of MM operations, and wherein the plurality of valid phased computation schedules have been pruned based on time dependencies to retain valid phased computation schedules that increase reuse of weights; partition, for each of the plurality of valid phased computation schedules, each of the MM operations to a plurality of processor cores based on L3 cache to L2 cache data movement to increase reuse of weights by minimizing data movement between caches; cause execution, for each of the plurality of valid phased computation schedules, of the RNN based on the partitioning; and store a final computation schedule based on the execution, wherein the final computation schedule is used for future executions of the RNN, and wherein the plurality of valid phased computation schedules comprises the final computation schedule. 11. The system of claim 10 , wherein to generate a plurality of valid phased computation schedules for the RNN the electronic processor is configured to generate schedules with time-independent phases before time-dependent phases. 12. The system of claim 10 , wherein to partition the electronic processor is further configured to map a partition of an MINI operation to a single processor core, wherein a weight matrix is reused over a sequence of MINI operations, wherein the partition of the MINI operation is part of the sequence of MINI operations, and wherein a part of the weight matrix is stored in an L2 cache of the single processor core. 13. The system of claim 12 , wherein the electronic processor is further configured to: determine two MM operations in a phase have a shared input matrix; and fuse the two MINI operations into a single MM operation. 14. The system of claim 13 , wherein the electronic processor is further configured to determine a plurality of parallelism degrees for multiple MM operations in a first phase for a first phased computation schedule, wherein the first phase computation schedule is executed with each of the plurality of parallelism levels. 15. The system of claim 14 , wherein a selected degree of parallelism is less than the number of the plurality of processor cores. 16. A non-transitory computer-readable storage medium storing computer-executable instructions for determining a computation schedule for a recurrent neural network (RNN), the stored instructions comprising: instructions to receive a matrix multiplication (MM) directed-acyclic graph (DAG) data structure constructed to model computations of the RNN, the DAG having nodes and edges, with each node representing an MM and each edge representing a dependency between the MMs; instructions to generate a plurality of valid phased computation schedules for the RNN from the MM-DAG, wherein each of the valid phase computation schedules includes an ordering of MM operations, and wherein the plurality of valid phased computation schedules have been pruned based on time dependencies to retain valid phased computation schedules that increase reuse of weights; instructions to partition, for each of the plurality of valid phased computation schedules, each of the MM operations to a plurality of processor cores based on L3 cache to L2 cache data movement to increase reuse of weights by minimizing data movement between caches; instructions to execute, for each of the plurality of valid phased computation schedules, the RNN based on the partitioning; and instructions to store a final computation schedule based on the executing, wherein the final computation schedule is used for future executions of the RNN, and wherein the plurality of valid phased computation schedules comprises the final computation schedule. 17. The non-transitory computer-readable storage medium of claim 16 , wherein the instructions to generate a plurality of valid phased computation schedules for the RNN comprise instructions to generate schedules with time-independent phases before time-dependent phases.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Shells for specifying net layout · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11526728B2 cover?
Systems, methods, and computer-executable instructions for determining a computation schedule for a recurrent neural network (RNN). A matrix multiplication (MM) directed-acyclic graph (DAG) is received for the RNN. Valid phased computation schedules for the RNN are generated. Each of the valid phase computation schedule includes an ordering of MM operations. For each of the plurality of valid p…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).