What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Memory operation for systolic array

US11501145B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11501145-B1
Application number	US-201916573201-A
Country	US
Kind code	B1
Filing date	Sep 17, 2019
Priority date	Sep 17, 2019
Publication date	Nov 15, 2022
Grant date	Nov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing a convolution operation in a neural network accelerator, comprising: loading a first weight data element of an array of weight data elements from a memory into a systolic array of the neural network accelerator, the first weight data element being at first coordinates and associated with a first input channel within the array of weight data elements; receiving a first subset of input data elements of an array of input data elements to multiply with the first weight data element to generate a first output tile of an output data array, the first subset of input data elements being selected from a first contiguous region of the memory and based on the first coordinates of the first weight data element, a stride of the convolution operation, and a location of the first output tile in the output data array; streaming each input data element of the first subset from the first contiguous region of the memory into the systolic array to multiply with the first weight data element to generate the first output tile; receiving a selection of a second subset of the input data elements to multiply with the first weight data element to generate a second output tile of the output data array, the second subset being selected from a second contiguous region of the memory and based on the first coordinates of the first weight data element and on the stride of the convolution operation; streaming each input data element of the second subset from the second contiguous region into the systolic array to multiply with the first weight data element to generate the second output tile; and assembling an output data array of the convolution operation from the first output tile and the second output tile. 2. The method of claim 1 , wherein the memory comprises a plurality of partitions; wherein each partition of the plurality of partitions stores a part of a chunk of input data elements of one or more input channels, the chunk of the input data elements being stored across the plurality of partitions following a repetitive sequential order; wherein the first contiguous region stores a first part of a first chunk of input data elements, the first part of the first chunk corresponding to the first input channel; wherein the second contiguous region stores a first part of a second chunk of input data elements, the first part of the second chunk corresponding to the first input channel; and the first contiguous region and the second contiguous region are in a first partition of the plurality of partitions. 3. The method of claim 2 , wherein: the first partition also stores part of a first part of a third chunk of input data elements, the first part of the third chunk corresponding to a different input channel from the first input channel; and the first contiguous region and the second contiguous region are separated by the first part of the third chunk of input data elements. 4. The method of claim 3 , wherein: each input data element of the array of input data elements is associated with an identifier of the chunk that includes the each input data element and a location of the chunk in the memory; and the first subset of input data elements are selected based on one or more identifiers of one or more chunks that include the first subset of input data elements and one or more locations of the one or more chunks indicating that the first subset of input data elements are stored in a contiguous region of the memory. 5. The method of claim 3 , further comprising: storing, at different times, the first output tile and the second output tile at a summation buffer; wherein a size of the chunk of input data elements is based on a size of the summation buffer and the stride of the convolution operation. 6. A non-transitory computer readable medium storing instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array, the first weight data element having first coordinates in the array of weight data elements; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on the first coordinates, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; load the first input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array. 7. The non-transitory computer readable medium of claim 6 , wherein the input data elements are stored in the memory following at least one of: a row-major order or a column-major order. 8. The non-transitory computer readable medium of claim 6 , wherein: the memory comprises a plurality of partitions; each partition is coupled with a row of the systolic array; each partition stores input data elements of one or more input channels; and the first input data elements are obtained from a first partition of the plurality of partitions and are streamed into a first row of the systolic array. 9. The non-transitory computer readable medium of claim 8 , wherein the convolution operation is between the array of weight data elements and an array of input data elements; wherein the array of input data elements comprises input data elements associated with a plurality of input data channels and is fragmented into a plurality of chunks, each chunk of the plurality of chunks comprising a subset of the array of input data elements and associated with at least a subset of a plurality of input channels; and wherein each partition of the plurality of partitions stores input data elements associated with an input channel of the each chunk in a contiguous region. 10. The non-transitory computer readable medium of claim 9 , wherein each input data element of the array of input data elements is associated with an attribute comprising: an identifier of a chunk of the plurality of the chunks that includes the each input data element, and a location of the chunk in the memory. 11. The non-transitory computer readable medium of claim 10 , wherein the plurality of chunks comprise: a first chunk of input data elements associated with a first subset of the plurality of input channels; a second chunk of input data elements associated with the first subset of the plurality of input channels; and a third chunk of input data elements associated with a second subset of the plurality of input channels. 12. The non-transitory computer readable medium of claim 11 , wherein the attributes of the first input data elements indicate: the first input data elements are included in the first chunk and the second chunk; and the first chunk and the second chunk are stored in a first contiguous region in the first partition; and wherein the first address is part of the first contiguous region. 13. The non-transitory computer readable medium of claim 12 , wherein input data elements of the first chunk, the second chunk, and the third chunk are stored in a contiguous region in the each partition; wherein input data elements of the first chunk and of the second chunk are separated by input data elements of the third chunk in the contiguous region in the ea

Assignees

Amazon Tech Inc

Inventors

Classifications

G06N3/063Primary
using electronic means · CPC title
G06N3/02
Neural networks · CPC title
G06F15/8046
Systolic arrays · CPC title
G06N3/0464Primary
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 84000669

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501145B1 cover?: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent inp…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Target port with distributed transactions

Digital signal conditioner system

Accelerated quantized multiply-and-add operations

Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions

Computational array microprocessor system with variable latency memory access

Apparatus and method for clustering data in streaming clustering without reducing precision

Frequently asked questions