Electronic device having graphics processor and acceleration method thereof

US12423378B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12423378-B2
Application numberUS-202117228895-A
CountryUS
Kind codeB2
Filing dateApr 13, 2021
Priority dateSep 29, 2020
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A graphics processor includes a texel unit and an execution unit. The texel unit includes a loading module. The execution unit includes an im2col module to execute an im2col algorithm to expand an original matrix to obtain an expansion matrix according to the size of a kernel. The execution unit multiplies the expansion matrix and the kernel to obtain a feature map matrix. The loading module calculates feature coordinates of each element of the feature map matrix according to the coordinates of the expansion matrix, and obtains the original coordinates of each element of the original matrix according to the feature coordinates, the size of the kernel, a stride, and padding. The loading module reads at least one of the memory blocks covered by the original coordinates of each element of the original matrix, and outputs data corresponding to the original coordinates in the memory blocks.

First claim

Opening claim text (preview).

What is claimed is: 1. An electronic device, comprising: a graphics processor, configured to accelerate a convolution calculation, configured to: read an original matrix used for the convolution calculation from a memory outside the graphics processor; wherein the memory comprises a plurality of memory blocks, each of which is adjacent to another and is the same size, and the original matrix is stored by a specific data configuration in at least one of the memory blocks; execute an im2col algorithm, expand the original matrix to obtain an expansion matrix according to a size of a kernel, and define expansion coordinates of each element in the expansion matrix; multiply the expansion matrix and the kernel to obtain a feature map matrix corresponding to the original matrix; receive the expansion coordinates, calculate feature coordinates of each element of the feature map matrix according to the expansion coordinates, and obtain original coordinates of each element of the original matrix according to the feature coordinates, the size of the kernel, a stride, and padding; and read the at least one of the memory blocks covered by the original coordinates of each element of the original matrix, and sends data corresponding to the original coordinates in the at least one of the memory blocks into the im2col algorithm; wherein the graphics processor comprises: a return buffer, receiving and storing data of the original matrix, or the data corresponding to the original coordinates in the at least one memory blocks; a data expander, expanding the original matrix using an im2col operation to obtain the expansion matrix; a data multiplexer, selecting data required for the convolution calculation in the expansion matrix according to the graphics processor; and an output merge buffer, combining the data in the expansion matrix selected by the data multiplexer, and outputting the combined data selected by the data multiplexer to a register file. 2. The electronic device as claimed in claim 1 , wherein the graphics processor further comprises the register file, to store the data in the original matrix, data in the expansion matrix, and data in the feature map matrix in the convolution calculation. 3. The electronic device as claimed in claim 2 , wherein the graphics processor executes the convolution calculation according to the data in the original matrix, the data in the expansion matrix, and the data in the feature map matrix in the register file. 4. The electronic device as claimed in claim 1 , wherein the graphics processor further comprises an L1 cache; in the convolution calculation, the L1 cache reads and stores the original matrix for the convolution calculation from the memory for the graphics processor to access. 5. The electronic device as claimed in claim 1 , wherein the graphics processor further comprises a second memory, to store a result of the convolution calculation executed by the graphics processor in the memory. 6. The electronic device as claimed in claim 1 , wherein a size of each memory block included in the memory is a matrix size of 4*8. 7. The electronic device as claimed in claim 1 , wherein the size of the kernel is a matrix size of 3*3, the stride is equal to 1, and the padding is equal to 0. 8. A method for accelerating a convolution calculation, applied to a graphics processor, comprising: the graphics processor receiving an original matrix from a memory outside the graphics processor; wherein the memory comprises a plurality of memory blocks, each of which is adjacent to another and is the same size, and the original matrix is stored by a specific data configuration in at least one of the memory blocks; the graphics processor executing an im2col algorithm, and expanding the original matrix to obtain an expansion matrix according to a size of a kernel; wherein each element in the expansion matrix has expansion coordinates; the graphics processor multiplying the expansion matrix and the kernel to obtain a feature map matrix corresponding to the original matrix; the graphics processor calculating feature coordinates of each element of the feature map matrix according to the expansion coordinates; the graphics processor obtaining original coordinates of each element of the original matrix according to the feature coordinates, the size of the kernel, a stride, and padding; the graphics processor reading the at least one of the memory blocks covered by the original coordinates of each element of the original matrix, and outputting data corresponding to the original coordinates in the at least one of the memory blocks; wherein the step of executing the im2col algorithm comprises: receiving and storing by a return buffer data of the original matrix, or the data corresponding to the original coordinates in the at least one memory blocks; expanding by a data expander the original matrix using an im2col operation to obtain the expansion matrix; selecting by a data multiplexer data required for the convolution calculation in the expansion matrix according to the graphics processor; combining by an output merge buffer the data in the expansion matrix selected by the data multiplexer, and outputting the combined data selected by the data multiplexer to a register file. 9. The method as claimed in claim 8 , wherein the step of executing the im2col algorithm further comprises storing by the register file the data in the original matrix, data in the expansion matrix, and data in the feature map matrix in the convolution calculation. 10. The method as claimed in claim 9 , wherein the step of executing the im2col algorithm further comprises executing by the graphics processor the convolution calculation according to the data in the original matrix, the data in the expansion matrix, and the data in the feature map matrix in the register file. 11. The method as claimed in claim 8 , wherein the step of executing the im2col algorithm further comprises reading and storing by an L1 cache the original matrix for the convolution calculation from the memory for the graphics processor to access in the convolution calculation. 12. The method as claimed in claim 8 , wherein the step of executing the im2col algorithm further comprises storing by a second memory a result of the convolution calculation executed by the graphics processor in the memory. 13. The method as claimed in claim 8 , wherein a size of each memory block included in the memory is a matrix size of 4*8. 14. The method as claimed in claim 8 , wherein the size of the kernel is a matrix size of 3*3, the stride is equal to 1, and the padding is equal to 0.

Assignees

Inventors

Classifications

  • G06F17/15Primary

    Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title

  • Memory management · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • with two or more cache hierarchy levels (with multilevel cache hierarchies G06F12/0811) · CPC title

  • Performance improvement · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12423378B2 cover?
A graphics processor includes a texel unit and an execution unit. The texel unit includes a loading module. The execution unit includes an im2col module to execute an im2col algorithm to expand an original matrix to obtain an expansion matrix according to the size of a kernel. The execution unit multiplies the expansion matrix and the kernel to obtain a feature map matrix. The loading module ca…
Who is the assignee on this patent?
Glenfly Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F17/15. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).