Optimizing for energy efficiency via near memory compute in scalable disaggregated memory architectures
US-2024338132-A1 · Oct 10, 2024 · US
US12430204B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12430204-B2 |
| Application number | US-202117553623-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2021 |
| Priority date | Dec 16, 2021 |
| Publication date | Sep 30, 2025 |
| Grant date | Sep 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A near memory compute system includes multiple computation nodes, such as nodes for parallel distributed processing. The nodes include a memory device to store data and compute hardware to perform a computation on the data. Error correction code (ECC) logic performs ECC on the data prior to computation on the data by the compute hardware. The node also includes residue check logic to perform a residue check on a result of the computation.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a memory of a computation node, the memory to store data; compute hardware of the computation node, to perform a computation on the data; error correction code (ECC) logic to perform ECC decoding and correction on the data prior to computation with a code that includes both ECC bits and residue check bits; and residue check logic to perform a residue check on a result of the computation with the residue check bits of the code. 2. The apparatus of claim 1 , wherein the memory is to store ECC bits for the data, wherein the ECC logic is to perform the ECC decoding and correction on the data with the ECC bits from the memory. 3. The apparatus of claim 1 , wherein the memory is to store a residue check value, wherein the residue check logic is to perform the residue check on the result of the computation with the residue check value. 4. The apparatus of claim 3 , wherein the residue check value comprises a modulo value of the data. 5. The apparatus of claim 1 , wherein the result of the computation is to be stored back in the memory. 6. The apparatus of claim 5 , wherein the ECC logic is to encode ECC bits to store for the result of the computation. 7. The apparatus of claim 5 , wherein the memory is to store the data, ECC bits, and a residue check value, where the ECC bits and residue check value represent a two-dimensional array. 8. The apparatus of claim 7 , wherein the memory is to store multiple rows having data bits and associated residue check values, and a row having parity bits, with a parity bit of a bit location to indicate parity for a column made up of the rows of data bits in the bit location or a column made up of the rows of bits in the bit location. 9. The apparatus of claim 1 , wherein the computation node comprises a node of a parallel distributed processing system having multiple parallel distributed processing nodes. 10. The apparatus of claim 9 , wherein the result of the computation is to be forwarded to another parallel distributed processing node. 11. A computer system, comprising: a host processor; and accelerator hardware coupled to the host processor, to receive a request for parallel distributed processing, the accelerator hardware including multiple processing nodes, wherein an individual processing node includes: a memory to store data; a compute unit to perform a computation on the data; error correction code (ECC) logic to perform ECC decoding and correction on the data prior to computation with a code that includes both ECC bits and residue check bits; and residue check logic to perform a residue check on a result of the computation with the residue check bits of the code. 12. The computer system of claim 11 , wherein the memory is to store a residue check value, wherein the residue check logic is to perform the residue check on the result of the computation with the residue check value. 13. The computer system of claim 11 , wherein the ECC logic is to encode ECC bits to store for the result of the computation. 14. The computer system of claim 11 , wherein the memory is to store the data, ECC bits, and a residue check value as a two-dimensional array. 15. The computer system of claim 11 , wherein the result of the computation is to be forwarded from one node to another. 16. The computer system of claim 11 , including one or more of: wherein the host processor comprises a multicore processor; a display communicatively coupled to the host processor; or a network interface communicatively coupled to the host processor. 17. A method for computation, comprising: storing data in a memory of a computation node; performing a computation on the data with compute hardware of the computation node; performing error correction code (ECC) decoding and correction on the data prior to performing the computation with a code that includes both ECC bits and residue check bits; and performing a residue check on a result of the computation with the residue check bits of the code. 18. The method of claim 17 , wherein performing the residue check comprises performing the residue check on the result of the computation with a residue check value stored in the memory. 19. The method of claim 17 , further comprising: receiving the result of the computation; and encoding ECC bits from the result to store in the memory for the result of the computation. 20. The method of claim 17 , further comprising: storing in the memory the data, ECC bits, and a residue check value as a two-dimensional array. 21. The method of claim 17 , wherein the computation node comprises a node of a parallel distributed processing system having multiple parallel distributed processing nodes, and further comprising: forwarding the result of the computation to another parallel distributed processing node.
Learning methods · CPC title
Architecture, e.g. interconnection topology · CPC title
in sector programmable memories, e.g. flash disk (G06F11/1072 takes precedence) · CPC title
using arrangements adapted for a specific error detection or correction feature · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.