Batching inputs to a machine learning model

US10789544B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10789544-B2
Application numberUS-201615091381-A
CountryUS
Kind codeB2
Filing dateApr 5, 2016
Priority dateApr 5, 2016
Publication dateSep 29, 2020
Grant dateSep 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for batching inputs to machine learning models. One of the methods includes receiving a stream of requests, each request identifying a respective input for processing by a first machine learning model; adding the respective input from each request to a first queue of inputs for processing by the first machine learning model; determining, at a first time, that a count of inputs in the first queue as of the first time equals or exceeds a maximum batch size and, in response: generating a first batched input from the inputs in the queue as of the first time so that a count of inputs in the first batched input equals the maximum batch size, and providing the first batched input for processing by the first machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a stream of requests, each request identifying a respective input tensor for processing by a machine learning model in a plurality of machine learning models, wherein each identified input tensor comprises respective input values and has a number of dimensions; adding the respective input tensor from each request to a queue of input tensors for processing by the machine learning model, wherein each machine learning model in the plurality of machine learning models is configured to receive a fixed-size input tensor that is a combination of a fixed number of tensors having the number of dimensions; determining, at a first time, that a count of input tensors in the queue of input tensors for processing by the machine learning model, as of the first time, is at least equal to the fixed number of tensors and in response: generating a first batched input tensor from input tensors in the queue as of the first time comprising adding input tensors to the first batched input tensor until the first batched input tensor comprises a number of input tensors equal to the fixed number of tensors, and providing the first batched input tensor for processing by the machine learning model to a computing device configured to execute the machine learning model; and determining, at a second time, that (i) a count of input tensors in the queue as of the second time is less than the fixed number of tensors and (ii) that an oldest input tensor in the queue is older than a respective latency parameter for the machine learning model, and in response: generating a second batched input tensor from the input tensors in the queue as of the second time, comprising adding each input tensor in the queue to the second batched input tensor, and adding placeholder tensors having the number of dimensions to the second batched input tensor until the second batched input tensor comprises a number of tensors equal to the fixed number of tensors, and providing the second batched input tensor for processing by the machine learning model to the computing device configured to execute the machine learning model. 2. The method of claim 1 , wherein each placeholder input is a copy of a respective input tensor in the queue. 3. The method of claim 1 , further comprising: receiving a second machine learning output generated by the machine learning model for the second batched input tensor; determining, for each input tensor having the number of dimensions in the second batched input tensor, a respective portion of the second machine learning output that corresponds to the input tensor; for each placeholder tensor in the second batched input tensor, discarding the respective portion of the second machine learning output that corresponds to the placeholder tensor; and providing a respective portion of the second machine learning output corresponding to each input tensor in the second batched input tensor that is not a placeholder tensor. 4. The method of claim 1 , wherein processing by the first machine learning model is managed by a computational graph system that represents operations of the machine learning model during processing of a given batched input tensor as a computational graph, wherein the computational graph comprises a plurality of nodes and directed edges, wherein each node represents a respective operation, and wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as an input tensor, an output tensor of an operation represented by the respective first node. 5. The method of claim 4 , wherein the computational graph system processes batched input tensors by assigning the operations of the machine learning model represented by the nodes in the computational graph among a plurality of computing devices. 6. The method of claim 4 , wherein providing the first batched input tensor for processing by the machine learning model comprises providing the first batched input tensor to the computational graph system, and wherein providing the second batched input tensor for processing by the machine learning model comprises providing the second batched input tensor to the computational graph system. 7. The method of claim 1 , further comprising: processing each machine learning model of the plurality of machine learning models on a respective device of a plurality of computing devices; adding input tensors to queues of machine learning models based on availability of respective computing devices processing the models to process the input tensors. 8. The method of claim 1 , further comprising: receiving a first batched output tensor from the machine learning model, wherein the first batched output tensor comprises respective output values corresponding to the respective input values for each input tensor; and generating, from the first batched output tensor, a plurality of output tensors, each output tensor comprising respective output values corresponding to a respective identified input tensor. 9. The method of claim 1 , wherein the machine learning model further specifies a plurality of acceptable numbers of tensors, each acceptable number less than the fixed number of tensors, and wherein the method further comprises: determining at the second time (i) that the count of input tensors in the queue as of the second time is less than the fixed number of tensors, (ii) that the oldest input tensor in the queue is older than the respective latency parameter for the machine learning model, and (iii) that the count of input tensors in the queue is between a first acceptable number and a second acceptable number of tensors of the plurality of acceptable numbers of tensors, wherein the second acceptable number of tensors is larger than the first acceptable number, and in response, generating a third batched input tensor from the input tensors, comprising adding each input tensor in the queue to the third batched input tensor, and adding placeholder tensors having the number of dimensions to the third batched input tensor until the third batched input tensor comprises a number of tensors equal to the second acceptable number of tensors, and providing the third batched input tensor for processing by the machine learning model to the computing device configured to execute the machine learning model. 10. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving a stream of requests, each request identifying a respective input tensor for processing by a first machine learning model in a plurality of machine learning models, wherein each identified input tensor comprises respective input values and has a number of dimensions; adding the respective input tensor from each request to a queue of input tensors for processing by the machine learning model, wherein each machine learning model in the plurality of machine learning models is configured to receive a fixed-size input tensor that is a combination of a fixed number of tensors having the number of dimensions; determining, at a first time, that a count of input tensors in the queue of input tensors for processing by the machine learning model, as of the first time, is at least equal to the fixed number of tensors and in response: generating a first batched input tensor from input tensors in the queue as of the first time comprising adding input tensors to the first batched input tensor until the first batched input tensor comprises a number of input tensors equal to the fixed number of tensors, and providing th

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Queue · CPC title

  • G06F9/546Primary

    Message passing systems or structures, e.g. queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10789544B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for batching inputs to machine learning models. One of the methods includes receiving a stream of requests, each request identifying a respective input for processing by a first machine learning model; adding the respective input from each request to a first queue of inputs for processing by the firs…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).