What technology area does this patent fall under?

Primary CPC classification G06F9/5016. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Asynchronous distributed data flow for machine learning workloads

US11556381B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11556381-B2
Application number	US-202217738909-A
Country	US
Kind code	B2
Filing date	May 6, 2022
Priority date	May 7, 2021
Publication date	Jan 17, 2023
Grant date	Jan 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a plurality of accelerator islands, each accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island, comprising assigning the respective portion of the machine learning workload to each of the plurality of accelerator islands by sending a single message to the respective scheduler for the accelerator island when the respective portion of the machine learning workload is a regular computation. 2. The system of claim 1 , wherein the data representing the machine learning workload is data representing a sharded dataflow program comprising a plurality of shards. 3. The system of claim 2 , wherein assigning the respective portion of the machine learning workload to each of the plurality of accelerator islands comprises assigning one or more shards of the sharded dataflow program to each of the plurality of accelerator islands. 4. The system of claim 1 , wherein each scheduler is configured to, when the respective portion of the machine learning workload assigned to the accelerator island is a regular computation, schedule the portion of the computation using parallel asynchronous dispatch. 5. The system of claim 4 , wherein scheduling the portion of the computation using parallel asynchronous dispatch comprises: generating a schedule that assigns, to each of a set of the hardware accelerators in the accelerator island, a respective set of one or more operations that takes as input and output of one or more respective other operations that are performed by another one of the hardware accelerators in the accelerator island; determining, for each of the set of hardware accelerators, a respective size of the output of the one or more respective other operations; and transmitting, in parallel and to the corresponding host for each of the set of hardware accelerators, respective future data specifying the respective size of the output of the one or more respective other operations. 6. The system of claim 5 , wherein the respective future data causes the corresponding host to (i) allocate memory on the hardware accelerator for storing the output of the one or more respective other operations and (ii) transmit data to a corresponding host of the accelerator assigned to the one or more respective other operations that identifies the allocated memory. 7. The system of claim 6 , wherein the corresponding host of the accelerator assigned to the one or more respective other operations is configured to cause the accelerator assigned to the one or more respective other operations to transmit the output of the respective other operations to the allocated memory. 8. The system of claim 7 , wherein the output is transmitted over an accelerator interconnect network. 9. One or more computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform the operations comprising: receiving data representing a machine learning workload by a plurality of accelerator islands, wherein each accelerator island comprises a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators, and wherein each accelerator island has a respective scheduler that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island; and assigning a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island, comprising assigning the respective portion of the machine learning workload to each of the plurality of accelerator islands by sending a single message to the respective scheduler for the accelerator island when the respective portion of the machine learning workload is a regular computation. 10. The computer-readable storage media of claim 9 , wherein each scheduler is configured to, when the respective portion of the machine learning workload assigned to the accelerator island is a regular computation, schedule the portion of the computation using parallel asynchronous dispatch. 11. A method comprising: receiving data representing a machine learning workload by a plurality of accelerator islands, wherein each accelerator island comprises a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators, and wherein each accelerator island has a respective scheduler that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island; and assigning a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island, comprising assigning the respective portion of the machine learning workload to each of the plurality of accelerator islands by sending a single message to the respective scheduler for the accelerator island when the respective portion of the machine learning workload is a regular computation. 12. The method of claim 11 , wherein the data representing the machine learning workload is data representing a sharded dataflow program comprising a plurality of shards. 13. The method of claim 12 , wherein assigning the respective portion of the machine learning workload to each of the plurality of accelerator islands comprises assigning one or more shards of the sharded dataflow program to each of the plurality of accelerator islands. 14. The method of claim 11 , wherein each scheduler is configured to, when the respective portion of the machine learning workload assigned to the accelerator island is a regular computation, schedule the portion of the computation using parallel asynchronous dispatch. 15. The method of claim 14 , wherein scheduling the portion of the computation using parallel asynchronous dispatch comprises: generating a schedule that assigns, to each of a set of the hardware accelerators in the accelerator island, a respective set of one or more operations that takes as input and output of one or more respective other operations that are performed by another one of the hardware accelerators in the accelerator island; determining, for each of the set of hardware accelerators, a respective size of the output of the one or more respective other operations; and transmitting, in parallel and to the corresponding host for each of the set of hardware accelerators, respective future data specifying the respective size of the output of the one or more respective other operations. 16. The method of claim 15 , wherein the respective future data causes the corresponding host to (i) allocate memory on the hardware accelerator for storing the output of the one or more respective other operations and (ii) transmit data to a corresponding host of the accelerator assigned to the one or more respecti

Assignees

Google Llc

Inventors

Classifications

G06F9/5016Primary
the resource being the memory · CPC title
G06F9/4881Primary
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
G06N3/08
Learning methods · CPC title
G06F9/5072
Grid computing · CPC title
G06N3/063
using electronic means · CPC title

Patent family

Related publications grouped by family.

View patent family 81927920

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11556381B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respect…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F9/5016. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).