Target port with distributed transactions
US-10613977-B1 · Apr 7, 2020 · US
US11501145B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11501145-B1 |
| Application number | US-201916573201-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 17, 2019 |
| Priority date | Sep 17, 2019 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
Opening claim text (preview).
What is claimed is: 1. A method for performing a convolution operation in a neural network accelerator, comprising: loading a first weight data element of an array of weight data elements from a memory into a systolic array of the neural network accelerator, the first weight data element being at first coordinates and associated with a first input channel within the array of weight data elements; receiving a first subset of input data elements of an array of input data elements to multiply with the first weight data element to generate a first output tile of an output data array, the first subset of input data elements being selected from a first contiguous region of the memory and based on the first coordinates of the first weight data element, a stride of the convolution operation, and a location of the first output tile in the output data array; streaming each input data element of the first subset from the first contiguous region of the memory into the systolic array to multiply with the first weight data element to generate the first output tile; receiving a selection of a second subset of the input data elements to multiply with the first weight data element to generate a second output tile of the output data array, the second subset being selected from a second contiguous region of the memory and based on the first coordinates of the first weight data element and on the stride of the convolution operation; streaming each input data element of the second subset from the second contiguous region into the systolic array to multiply with the first weight data element to generate the second output tile; and assembling an output data array of the convolution operation from the first output tile and the second output tile. 2. The method of claim 1 , wherein the memory comprises a plurality of partitions; wherein each partition of the plurality of partitions stores a part of a chunk of input data elements of one or more input channels, the chunk of the input data elements being stored across the plurality of partitions following a repetitive sequential order; wherein the first contiguous region stores a first part of a first chunk of input data elements, the first part of the first chunk corresponding to the first input channel; wherein the second contiguous region stores a first part of a second chunk of input data elements, the first part of the second chunk corresponding to the first input channel; and the first contiguous region and the second contiguous region are in a first partition of the plurality of partitions. 3. The method of claim 2 , wherein: the first partition also stores part of a first part of a third chunk of input data elements, the first part of the third chunk corresponding to a different input channel from the first input channel; and the first contiguous region and the second contiguous region are separated by the first part of the third chunk of input data elements. 4. The method of claim 3 , wherein: each input data element of the array of input data elements is associated with an identifier of the chunk that includes the each input data element and a location of the chunk in the memory; and the first subset of input data elements are selected based on one or more identifiers of one or more chunks that include the first subset of input data elements and one or more locations of the one or more chunks indicating that the first subset of input data elements are stored in a contiguous region of the memory. 5. The method of claim 3 , further comprising: storing, at different times, the first output tile and the second output tile at a summation buffer; wherein a size of the chunk of input data elements is based on a size of the summation buffer and the stride of the convolution operation. 6. A non-transitory computer readable medium storing instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array, the first weight data element having first coordinates in the array of weight data elements; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on the first coordinates, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; load the first input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array. 7. The non-transitory computer readable medium of claim 6 , wherein the input data elements are stored in the memory following at least one of: a row-major order or a column-major order. 8. The non-transitory computer readable medium of claim 6 , wherein: the memory comprises a plurality of partitions; each partition is coupled with a row of the systolic array; each partition stores input data elements of one or more input channels; and the first input data elements are obtained from a first partition of the plurality of partitions and are streamed into a first row of the systolic array. 9. The non-transitory computer readable medium of claim 8 , wherein the convolution operation is between the array of weight data elements and an array of input data elements; wherein the array of input data elements comprises input data elements associated with a plurality of input data channels and is fragmented into a plurality of chunks, each chunk of the plurality of chunks comprising a subset of the array of input data elements and associated with at least a subset of a plurality of input channels; and wherein each partition of the plurality of partitions stores input data elements associated with an input channel of the each chunk in a contiguous region. 10. The non-transitory computer readable medium of claim 9 , wherein each input data element of the array of input data elements is associated with an attribute comprising: an identifier of a chunk of the plurality of the chunks that includes the each input data element, and a location of the chunk in the memory. 11. The non-transitory computer readable medium of claim 10 , wherein the plurality of chunks comprise: a first chunk of input data elements associated with a first subset of the plurality of input channels; a second chunk of input data elements associated with the first subset of the plurality of input channels; and a third chunk of input data elements associated with a second subset of the plurality of input channels. 12. The non-transitory computer readable medium of claim 11 , wherein the attributes of the first input data elements indicate: the first input data elements are included in the first chunk and the second chunk; and the first chunk and the second chunk are stored in a first contiguous region in the first partition; and wherein the first address is part of the first contiguous region. 13. The non-transitory computer readable medium of claim 12 , wherein input data elements of the first chunk, the second chunk, and the third chunk are stored in a contiguous region in the each partition; wherein input data elements of the first chunk and of the second chunk are separated by input data elements of the third chunk in the contiguous region in the ea
using electronic means · CPC title
Neural networks · CPC title
Systolic arrays · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.