Acceleration unit for a deep learning engine
US-11687762-B2 · Jun 27, 2023 · US
US12198034B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12198034-B2 |
| Application number | US-202117481588-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 22, 2021 |
| Priority date | Sep 22, 2020 |
| Publication date | Jan 14, 2025 |
| Grant date | Jan 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data processing system and method are disclosed, for implementing a windowed operation in at least three traversed dimensions. The data processing system maps the windowed operation in at least three traversed dimensions to a plurality of constituent windowed operations in two traversed dimensions. This plurality of 2-D windowed operations is implemented as such in at least one hardware accelerator. The data processing system assembles the results of the constituent 2-D windowed operations to produce the result of the windowed operation in at least three traversed dimensions.
Opening claim text (preview).
What is claimed is: 1. A method of implementing a windowed operation in at least three traversed dimensions, the windowed operation comprising applying a window having at least three dimensions to data having at least three traversed dimensions, with shifts of the window in all three traversed dimensions, the method comprising: selecting two dimensions of the at least three traversed dimensions; mapping the windowed operation to a plurality of constituent 2-D windowed operations in the selected two dimensions, each 2-D windowed operation comprising applying a slice of the window to a slice of the data, with shifts of the slice of the window in only two dimensions; implementing each of the plurality of 2-D windowed operations by at least one hardware accelerator, each 2-D windowed operation producing a respective partial result; and assembling the partial results to produce the result of the windowed operation, wherein the windowed operation is implemented as part of a neural network; wherein the neural network comprises a plurality of layers, wherein the windowed operation in at least three traversed dimensions is a first windowed operation forming one layer of said neural network, the plurality of constituent 2-D windowed operations is a first plurality of constituent 2-D windowed operations, and the partial results are first partial results, the neural network comprising another layer comprising a second windowed operation in at least three traversed dimensions, the method further comprising: mapping the neural network to a restructured neural network, wherein the first windowed operation is mapped to the first plurality of constituent 2-D windowed operations and the second windowed operation is mapped to a second plurality of constituent 2-D windowed operations; and implementing the second plurality of constituent 2-D windowed operations by the at least one hardware accelerator; wherein mapping the neural network to the restructured neural network comprises: identifying that the first windowed operation and the second windowed operation are not supported by the at least one hardware accelerator, and in response, mapping them respectively to the first plurality and the second plurality of constituent 2-D windowed operations. 2. The method of claim 1 , wherein the data comprises zero-padded data, and wherein mapping the windowed operation to the plurality of constituent 2-D windowed operations comprises: excluding from the plurality of constituent 2-D windowed operations a 2-D windowed operation that would be applied to a slice of the zero-padded data that consists solely of zeros. 3. The method of claim 1 , further comprising, when implementing one of the plurality of 2-D windowed operations in the at least one hardware accelerator, storing at least a part of the slice of the data or at least a part of the slice of the window in a local memory of the at least one hardware accelerator, and when subsequently implementing another one of the plurality of 2-D windowed operations in the at least one hardware accelerator, reusing the stored part. 4. The method of claim 1 , wherein the windowed operation is one of: a convolution operation, wherein each of the 2-D windowed operations is a 2-D convolution operation, and wherein assembling the partial results comprises combining them by summing the partial results; a maximum operation, wherein each of the 2-D windowed operations is a maximum operation, and wherein assembling the partial results comprises combining them by identifying the maximum among the partial results; a minimum operation, wherein each of the 2-D windowed operations is a minimum operation, and wherein assembling the partial results comprises combining them by identifying the minimum among the partial results; and a mean pooling operation, wherein each of the 2-D windowed operations is a mean pooling operation, and wherein assembling the partial results comprises combining them by calculating the mean of the partial results. 5. The method of claim 1 , wherein the windowed operation includes a bias addition, wherein mapping the windowed operation to the plurality of constituent 2-D windowed operations comprises mapping the bias addition to a selected one of the 2-D windowed operations, wherein the bias addition is performed in the at least one hardware accelerator as part of the selected 2-D windowed operation ( 730 a ). 6. The method of claim 1 , wherein the data comprises one of the following, or a derivative thereof: video data comprising two spatial dimensions and one temporal dimension; and volumetric data, comprising three spatial dimensions. 7. A non-transitory computer readable storage medium having stored thereon computer readable code configured to cause the method of claim 1 to be performed when the code is run. 8. The method of claim 1 , wherein the selected two dimensions are the two dimensions of the at least three traversed dimensions with the largest number of shifts of the window. 9. A data processing system for implementing a windowed operation in at least three traversed dimensions, the windowed operation comprising applying a window having at least three dimensions to data having at least three traversed dimensions, with shifts of the window in all three traversed dimensions, the data processing system comprising: a transformation unit, configured to select two dimensions of the at least three traversed dimensions and map the windowed operation to a plurality of constituent 2-D windowed operations in the selected two dimensions, each 2-D windowed operation comprising applying a slice of the window to a slice of the data, with shifts of the slice of the window in only two dimensions; at least one hardware accelerator, comprising circuitry configured to implement the plurality of 2-D windowed operations, each 2-D windowed operation producing a respective partial result; and an assembly unit, configured to assemble the partial results to produce the result of the windowed operation, wherein the windowed operation is implemented as part of a neural network; wherein the neural network comprises a plurality of layers, wherein the windowed operation in at least three traversed dimensions is a first windowed operation forming one layer of said neural network, the plurality of constituent 2-D windowed operations is a first plurality of constituent 2-D windowed operations, and the partial results are first partial results; the neural network comprising another layer comprising a second windowed operation in at least three traversed dimensions; wherein the transformation unit is configured to map the neural network to a restructured neural network, wherein the first windowed operation is mapped to the first plurality of constituent 2-D windowed operations and the second windowed operation is mapped to a second plurality of constituent 2-D windowed operations, wherein the at least one hardware accelerator is further configured to implement the second plurality of constituent 2-D windowed operations; wherein the transformation unit is configured to, when mapping the neural network to the restructured neural network, identify that the first windowed operation and the second windowed operation are not supported by the at least one hardware accelerator; and in response, map them respectively to the first plurality and the second plurality of constituent 2-D windowed operations. 10. The data processing system of claim 9 , wherein the at least one hardware accelerator comprises any one, or any combination of two or more of: one or more convolution engines, comprising circuitry configured to perform convolution calculations; a pooling unit, comprising circuitry configured
Convolutional networks [CNN, ConvNet] · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Learning methods · CPC title
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.