Multi-precision digital compute-in-memory deep neural network engine for flexible and energy efficient inferencing
US-12079733-B2 · Sep 3, 2024 · US
US9928261B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928261-B2 |
| Application number | US-201414582314-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 24, 2014 |
| Priority date | Jan 29, 2014 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An information processing system, computer readable storage medium, and method for accelerated radix sort processing of data elements in an array in memory. The information processing system stores an array of data elements in a buffer memory in an application specific integrated circuit radix sort accelerator. The array has a head end and a tail end. The system radix sort processing, with a head processor, data elements starting at the head end of the array and progressively advancing radix sort processing data elements toward the tail end of the array. The system radix sort processing, with a tail processor, data elements starting at the tail end of the array and progressively advancing radix sort processing data elements toward the head end of the array, the tail processor radix sort processing data elements in the array contemporaneously with the head processor radix sort processing data elements in the array.
Opening claim text (preview).
What is claimed is: 1. An information processing system comprising: at least one host processor; main memory, communicatively coupled with the at least one host processor; non-volatile memory, communicatively coupled with the at least one host processor; a radix sort memory manager stored in the non-volatile memory and communicatively coupled with the at least one host processor for radix sort processing data elements in an array in main memory; and an application specific integrated circuit radix sort accelerator (Accelerator), communicatively coupled with the at least one host processor and the main memory, the Accelerator and the host processor contemporaneously share access to the data elements in the array in the main memory in a data streaming architecture in which the Accelerator radix sort processes the data elements in the array in main memory, the Accelerator comprising: buffer memory for at least storing data elements from the array in main memory in a plurality of radix sort buckets in the buffer memory, the buffer memory capacity to store data elements from the array in the main memory being a small portion of the array of data elements in the main memory; a plurality of radix sort processors for radix sort processing the data elements in each radix sort bucket in the plurality of radix sort buckets in the buffer memory in the Accelerator; and a pre-fetching engine for copying a small portion of the data elements from the array in the main memory to the buffer memory in the Accelerator to be radix sort processed, and for transferring data elements, in the data streaming architecture, between the array in main memory and a radix sort bucket selected by the pre-fetching engine from the plurality of radix sort buckets in the buffer memory in the Accelerator, the pre-fetching engine predicting which data elements will be needed in a near future radix sort process in the Accelerator, transferring the needed data elements, and hiding memory latency when accessing the array of data elements in the main memory, wherein the pre-fetching engine transfers at least one data element from the array in main memory to the selected radix sort bucket, based on determining that a total number of data elements in the selected radix sort bucket reaches a low threshold of data elements remaining to be radix sort processed by the plurality of radix sort processors, and wherein the pre-fetching engine transfers at least one data element from the selected radix sort bucket to the array in main memory, based on determining that a total number of data elements in the selected radix sort bucket reaches a high threshold of data elements remaining to be radix sort processed by the plurality of radix sort processors. 2. The information processing system of claim 1 , wherein the pre-fetching engine transfers data elements between the array in main memory and the selected radix sort bucket contemporaneously with the plurality of radix sort processors radix sort processing data elements in the selected radix sort bucket. 3. The information processing system of claim 2 , wherein the pre-fetching engine transfers data elements between the array in main memory and the selected radix sort bucket before a total number of data elements to be radix sort processed in the selected radix sort bucket by the plurality of radix sort processors reaches zero. 4. The information processing system of claim 1 , wherein the radix sort memory manager, interoperating with the at least one host processor, communicates with a processor in the Accelerator to structurally decompose a large radix sorting problem of radix sorting data elements in the array into a set of multiple independent smaller radix sub-sorting problems of radix sorting data elements in the array, the Accelerator creating and informing the at least one host processor of the creation of such set of multiple independent smaller radix sub-sorting problems from the larger radix sorting problem, the radix sort memory manager assigning to the Accelerator a first independent smaller radix sub-sorting problem in the set of multiple independent smaller radix sub-sorting problems to be sorted by the Accelerator, and the radix sort memory manager assigning to the at least one host processor a second independent smaller radix sub-sorting problem in the set of multiple independent smaller radix sub-sorting problems to be sorted by the at least one host processor. 5. A computer readable storage medium, comprising computer instructions which, responsive to being executed by a processor of a plurality of processors of an information processing system, cause the processor to perform operations for accelerated radix sort processing of an array of data elements in main memory, the operations comprising: transferring a portion of data elements from an array of data elements being radix sort processed in main memory to an array of data elements in a radix sort bucket in a first memory in an application specific integrated circuit radix sort accelerator (Accelerator) to be radix sort processed in the Accelerator, the first memory capacity to store data elements from the array in the main memory being a small portion of the array of data elements in the main memory; storing the transferred data elements in the array of data elements in the radix sort bucket in the first memory, the array having a head end and a tail end; radix sort processing, with a head processor, data elements in the array of data elements in the radix sort bucket starting at the head end of the array and progressively advancing radix sort processing data elements toward the tail end of the array; and radix sort processing, with a tail processor, data elements in the array of data elements in the radix sort bucket starting at the tail end of the array and progressively advancing radix sort processing data elements toward the head end of the array, the tail processor radix sort processing data elements in the array contemporaneously with the head processor radix sort processing data elements in the array. 6. The computer readable storage medium of claim 5 , wherein the array of data elements in the radix sort bucket in the first memory comprises a first radix sort bucket in a plurality of radix sort buckets in the first memory, and the computer readable storage medium further comprising computer instructions which, responsive to being executed by a processor of a plurality of processors of the information processing system, cause the processor to perform operations comprising: radix sort processing, with at least one of the head processor and the tail processor, data elements in the first radix sort bucket by the at least one of the head processor and the tail processor using a respective one of a head pointer and tail pointer to point to each data element in the first radix sort bucket; applying a radix sort mask to a data element in the first radix sort bucket pointed to by a respective one of the head pointer and tail pointer, thereby identifying a significant radix sort symbol in the data element; determining, based on the identified significant radix sort symbol in the data element, whether the data element belongs in the first radix sort bucket according to a radix sort algorithm; and progressively advancing a value in the respective one of the head pointer and tail pointer to point to a next data element in the first radix sort bucket, based on determining that the data element belongs in the first radix sort bucket according to the radix sort algorithm. 7. The computer readable storage medium of claim 6 , wherein progressively advancing a value in the head pointer comprises updating a value in the head pointer to point with the value in the head pointer to a next data element in the first radi
Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory · CPC title
Temporary buffering, e.g. using volatile buffer or dedicated buffer blocks · CPC title
Physics · mapped topic
Latency reduction · CPC title
Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.