Reducing data transfer to machine learning accelerator hardware

US11662986B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11662986-B1
Application numberUS-202016825282-A
CountryUS
Kind codeB1
Filing dateMar 20, 2020
Priority dateMar 20, 2020
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer program compiled for a machine learning accelerator hardware and associated with a default input data size is received. An execution of an operation of the computer program is initiated. It is identified that a data size of an input data of the operation is smaller than the default input data size. The smaller data size of the input data of the operation rather than the default input data size is caused to be transferred to the machine learning accelerator hardware for the input data of the operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving a computer program compiled for a machine learning accelerator hardware and associated with a default input data size; initiating an execution of an operation of the computer program; receiving a data size of an input data of the operation; identifying that the data size of the input data of the operation is smaller than the default input data size; and causing the data size of the input data of the operation that is smaller than the default input data size to be transferred to the machine learning accelerator hardware for the input data of the operation, including by: utilizing a device manager component configured to manage the machine learning accelerator hardware and provide a direct memory transfer instruction using the data size of the input data of the operation that is smaller than the default input data size; receiving the direct memory transfer instruction using a driver component that is configured to be an interface between the device manager component and the machine learning accelerator hardware; and utilizing the driver component to generate a peripheral component interconnect bus compatible transfer command to transfer the data size of the input data of the operation based on the received direct memory transfer instruction. 2. The method of claim 1 , wherein the machine learning accelerator hardware includes one or more of the following components: an application-specific integrated circuit, a graphics processing unit, or a field-programmable gate array. 3. The method of claim 1 , wherein the operation is a convolution operation. 4. The method of claim 1 , wherein the operation is a personalized recommendation system operation. 5. The method of claim 1 , wherein the operation is part of a machine learning inference operation. 6. The method of claim 1 , wherein the data size of the input data is a size of a tensor of raw data. 7. The method of claim 6 , wherein the tensor of raw data includes image data or embedding table data. 8. The method of claim 6 , wherein the tensor includes data organized along one or more dimensions corresponding to one or more of the following properties: batch size, height, width, or depth. 9. The method of claim 1 , further comprising receiving a request to execute the operation. 10. The method of claim 9 , wherein the request is received via a network. 11. The method of claim 1 , further comprising receiving the data size of the input data, the default input data size, or both the data size of the input data and the default input data size from a requestor of the operation. 12. The method of claim 1 , further comprising receiving the data size of the input data, the default input data size, or both the data size of the input data and the default input data size as metadata in a container that also includes the input data. 13. The method of claim 1 , further comprising returning a result of the execution of the operation. 14. The method of claim 1 , wherein initiating the execution of the operation includes loading the computer program into a software runtime environment that is configured to communicate with the machine learning accelerator hardware. 15. A system, comprising: a processor configured to: receive a computer program compiled for a machine learning accelerator hardware and associated with a default input data size; initiate an execution of an operation of the computer program; receive a data size of an input data of the operation; identify that the data size of the input data of the operation is smaller than the default input data size; and cause the data size of the input data of the operation that is smaller than the default input data size to be transferred to the machine learning accelerator hardware for the input data of the operation, including by being configured to: utilize a device manager component configured to manage the machine learning accelerator hardware and provide a direct memory transfer instruction using the data size of the input data of the operation that is smaller than the default input data size; utilize a driver component configured to be an interface between the device manager component and the machine learning accelerator hardware to receive the direct memory transfer instruction; and utilize the driver component to generate a peripheral component interconnect bus compatible transfer command to transfer the data size of the input data of the operation based on the received direct memory transfer instruction; the machine learning accelerator hardware; and a memory coupled to the machine learning accelerator hardware. 16. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: receiving a computer program compiled for a machine learning accelerator hardware and associated with a default input data size; initiating an execution of an operation of the computer program; receiving a data size of an input data of the operation; identifying that the data size of the input data of the operation is smaller than the default input data size; and causing the data size of the input data of the operation that is smaller than the default input data size to be transferred to the machine learning accelerator hardware for the input data of the operation, including by: utilizing a device manager component configured to manage the machine learning accelerator hardware and provide a direct memory transfer instruction using the data size of the input data of the operation that is smaller than the default input data size; receiving the direct memory transfer instruction using a driver component that is configured to be an interface between the device manager component and the machine learning accelerator hardware; and utilizing the driver component to generate a peripheral component interconnect bus compatible transfer command to transfer the data size of the input data of the operation based on the received direct memory transfer instruction. 17. The computer program product of claim 16 , wherein the machine learning accelerator hardware includes one or more of the following components: an application-specific integrated circuit, a graphics processing unit, or a field-programmable gate array. 18. The computer program product of claim 16 , wherein the operation is a convolution operation. 19. The computer program product of claim 16 , wherein the operation is a personalized recommendation system operation. 20. The computer program product of claim 16 , wherein the operation is part of a machine learning inference operation.

Assignees

Inventors

Classifications

  • G06F8/41Primary

    Compilation · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F9/445Primary

    Program loading or initiating (bootstrapping G06F9/4401; security arrangements for program loading or initiating G06F21/57) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11662986B1 cover?
A computer program compiled for a machine learning accelerator hardware and associated with a default input data size is received. An execution of an operation of the computer program is initiated. It is identified that a data size of an input data of the operation is smaller than the default input data size. The smaller data size of the input data of the operation rather than the default input…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/41. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).