Artificial intelligence integrated circuit

US11625587B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11625587-B2
Application numberUS-202016745675-A
CountryUS
Kind codeB2
Filing dateJan 17, 2020
Priority dateJan 3, 2020
Publication dateApr 11, 2023
Grant dateApr 11, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An artificial intelligence integrated circuit is provided. The artificial intelligence integrated circuit includes a flash memory, a dynamic random access memory (DRAM), and a memory controller. The flash memory is configured to store a logical-to-physical mapping (L2P) table that is divided into a plurality of group-mapping (G2P) tables. The memory controller includes a first processing core and a second processing core. The first processing core receives a host access command from a host. When a specific G2P table corresponding to a specific logical address in the host access command is not stored in the DRAM, the first processing core determines whether the second processing core has loaded the specific G2P table from the flash memory to the DRAM according to the values in a first column in a first bit map and in a second column of a second bit map.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence integrated circuit, comprising: a command processor, configured to analyze a command queue to generate one or more tasks; a plurality of processing elements, each processing element being disposed in parallel; a task constructor, configured to receive the task from the command processor to generate a plurality of threads to control the processing elements; a level-1 (L1) cache; and a level-2 (L2) cache; wherein each processing element comprises: a plurality of arithmetic logic units (ALUs), configured to perform arithmetic and logic operations; a plurality of deep-learning accelerators, configured to perform hardware multiplication-addition operations, activation functions, and pooling; a common register file, configured to store data and intermediate results of operations performed by the ALUs and deep-learning accelerators; and an access controller, configured to control data access to the L1 cache and the L2 cache; wherein the access controller is configured to control the L1 cache and L2 cache to dynamically prefetch data stored in a memory unit external to the artificial intelligence integrated circuit, and the prefetched data is for use by matrix multiplication-addition operations performed by the deep-learning accelerators; wherein the L1 cache comprises a first preload circuit and the L2 cache comprises a second preload circuit, and the first preload circuit and the second preload circuit prefetch data from the L2 cache and the memory unit, respectively; wherein when the access controller is tasked to write first data to the L1 cache, the first preload circuit sends the first data to a first data compressor for a first data compression process to generate second data, and the first data compressor writes the second data to the L2 cache; wherein the second preload circuit sends the second data to a second data compressor for a second data compression process to generate third data, and the second data compressor writes the third data to the memory unit. 2. The artificial intelligence integrated circuit as claimed in claim 1 , wherein the memory unit is a dynamic random access memory. 3. The artificial intelligence integrated circuit as claimed in claim 1 , wherein the memory unit is a host buffer memory of a host that is electrically connected to the artificial intelligence integrated circuit. 4. The artificial intelligence integrated circuit as claimed in claim 1 , wherein the first data compression process is tasked to compress the first data using a compression algorithm for expanded matrix data to generate the second data, and the second data compression process is tasked to compress the second data using a residue-based image-compression algorithm and a sparse-matrix-compression algorithm to generate the third data. 5. The artificial intelligence integrated circuit as claimed in claim 1 , wherein when the access controller is tasked to read the third data stored in the memory unit, the second preload circuit sends the third data to a second decompression circuit to perform a second data decompression process on the third data to obtain the second data, wherein the first preload circuit directly transmits the second data to a first decompression circuit in each processing element to perform a first data decompression process on the second data to obtain the first data, and stores the first data in the common register file of each processing element. 6. The artificial intelligence integrated circuit as claimed in claim 1 , wherein the artificial intelligence integrated circuit supports application programming interfaces (API) of OpenCL, CUDA, and DirectCompute. 7. The artificial intelligence integrated circuit as claimed in claim 1 , wherein the artificial intelligence integrated circuit does not comprise a three-dimensional (3D) graphics rendering module. 8. The artificial intelligence integrated circuit as claimed in claim 4 , wherein the deep-learning accelerator in each processing element comprises: a matrix multiplication-addition calculator, configured to perform a matrix multiplication-addition calculation on the first data to obtain a first matrix calculation result; an activation-function circuit, configured to perform activation on the first matrix calculation result to generate a second matrix calculation result; and a pooling circuit, configured to perform pooling on the second matrix calculation result to generate a final result, and to store the final result in the common register file. 9. The artificial intelligence integrated circuit as claimed in claim 8 , wherein in response to the first data for matrix convolution calculation stored in the common register file being ready, the deep-learning accelerator loads the first data to a register file in the deep-learning accelerator, and loads the first data from the register file to the matrix multiplication-addition calculator to perform matrix multiplication-addition operations. 10. The artificial intelligence integrated circuit as claimed in claim 8 , wherein the first preload circuit and the second preload circuit can be set to a hardware mode or a software mode, wherein in response to the first preload circuit and the second preload circuit being set to the hardware mode, the first preload circuit and the second preload circuit performs address prediction using the previously fetched data, and respectively prefetch data from the L2 cache and the memory unit according to the predicted address, wherein in response to the first preload circuit and the second preload circuit being set to the software mode, the first preload circuit and the second preload circuit respectively fetch data from the L2 cache and the memory unit according to hint information from software. 11. The artificial intelligence integrated circuit as claimed in claim 8 , wherein the matrix multiplication-addition calculator supports matrix multiplication in any matrix size and accelerated multiplication of sparse matrices, and determines calculations of loops according to size and sparsity of matrices. 12. The artificial intelligence integrated circuit as claimed in claim 8 , wherein the activation-function circuit supports rectified linear unit (ReLU), sigmod, and tanh functions. 13. The artificial intelligence integrated circuit as claimed in claim 8 , wherein the pooling circuit performs mean pooling or max pooling on the second matrix calculation result to generate the final result.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • to perform operations on data operands · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11625587B2 cover?
An artificial intelligence integrated circuit is provided. The artificial intelligence integrated circuit includes a flash memory, a dynamic random access memory (DRAM), and a memory controller. The flash memory is configured to store a logical-to-physical mapping (L2P) table that is divided into a plurality of group-mapping (G2P) tables. The memory controller includes a first processing core a…
Who is the assignee on this patent?
Shanghai Zhaoxin Semiconductor Co Ltd, Glenfly Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).