Storage Module and Method for Determining Ready/Busy Status of a Plurality of Memory Dies
US-2015363342-A1 · Dec 17, 2015 · US
US2016011996A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016011996-A1 |
| Application number | US-201514701371-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 30, 2015 |
| Priority date | Jan 8, 2010 |
| Publication date | Jan 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
Opening claim text (preview).
1 . A massively parallel computing structure comprising: a plurality of processing nodes interconnected by multiple independent networks, each node including a plurality of processing elements for performing computation or communication activity as required when performing parallel algorithm operations, a first of said networks includes an n-dimensional torus network, n is an integer equal to or greater than 5, including communication links interconnecting said nodes for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof; said n-dimensional torus network for enabling point-to-point, all-to-all, collective (broadcast, reduce) and global barrier and notification functions among said nodes or independent partitioned subsets thereof, wherein combinations of said networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance; wherein each said processing element is multi-way hardware threaded supporting transactional memory execution and thread level speculation, wherein said plurality of processing elements are configured to run speculative threads in parallel; a cache memory associated with each said processing element at each node, said associated cache memory including a second level (L2) cache supporting thread-level speculative operations (TLS), said TLS operations handling multiple versions of data, and a DMA (direct memory access) network interface for transferring data to/from a cache memory, said DMA interface enabling internode communications that overlap with computations running concurrently on the nodes, wherein a processing element retrieves data by issuing a command and passing the command to each of a stream prefetch engine and a list prefetch engine, the stream prefetch engine and the list prefetch engine for prefetching data to be needed in subsequent clock cycles in the processor in response to the passed command. 2 . The massively parallel computing structure as claimed in claim 1 , wherein n is 5, said 5-D torus network is utilized to enable simultaneous computing and message communication activities among individual nodes and partitioned subsets of nodes according to bandwidth and latency requirements of an algorithm being performed. 3 . The massively parallel computing structure as claimed in claim 2 , wherein said 5-D network is utilized to enable simultaneous computing and message communication activities among individual nodes and independent parallel processing among one or more partitioned subsets of said plurality of nodes according to needs of a parallel algorithm. 4 . The massively parallel computing structure as claimed in claim 3 , wherein said 5-D network is utilized to enable dynamic switching between computing and message communication activities among individual nodes according to needs of a parallel algorithm. 5 . The massively parallel computing structure as claimed in claim 3 , wherein the stream prefetch engine is configured to: determine a slowest data or instruction stream and a fastest data or instruction stream, based on speeds of data or instruction streams processed by the processor; decrease a prefetching depth of the slowest data or instruction stream, the prefetching depth referring to a specific amount of data or instructions to be prefetched; and increase the prefetching depth of the fastest data or instruction stream by the decreased prefetching depth of the slowest data or instruction stream. 6 . The massively parallel computing structure as claimed in claim 5 , further rcomprising a look-up engine for determining whether data requested in the command has been prefetched, said look-up engine comprising: a comparator for comparing an address in the command and addresses for which prefetch requests have been issued. 7 . The massively parallel computing structure as claimed in claim 6 , wherein the stream prefetch engine issues a load command for the requested data to a memory system in response to determining that the requested data has not been prefetched, wherein the stream prefetch engine and the list prefetch engine work simultaneously. 8 . The massively parallel computing structure as claimed in claim 3 , further comprising: a messaging system associated with a node, said messaging system comprising: a plurality of network transmit devices for transmitting message packets over a network; a network injection queue associated with a network transmit device, each said network injection queue adapted to buffer a packet to be transmitted; injection control unit for receiving and processing requests from processor units at a node for transmitting messages over a network via one or more network transmit devices; a plurality of parallel distributed injection messaging engine units (iMEs) each providing a multi-channel DMA function, each injection messaging engine unit operatively connected with said injection control unit and configured to read data in said associated memory system via said interconnect device, and forming a packet belonging to said message, said packet including a packet header and said read data, an interconnect interface device having one or more ports for coupling each injection message engine unit of said distributed plurality with said interconnect device, each port adapted for forwarding data content read from specified locations in associated memory system to at least one requesting injection messaging engine unit in parallel, said associated memory system including a plurality of injection memory buffers, each injection memory buffer adapted to receive, from a processor, a descriptor associated with a message to be transmitted over a network, said descriptor including a specified target address having said data to be included in said message, one of said injection messaging engine units accessing said descriptor data for reading said data to be included in said message from said memory system, wherein a network transmit device provides a signal to indicate to a corresponding said injection messaging engine unit whether or not there is space in a corresponding network injection queue for writing packet data to the network injection queue, wherein, at said node, two or more packets associated with two or more different messages may be simultaneously formed by a respective two or more injection messaging engine units, in parallel, for simultaneous transmission over said network. 9 . The massively parallel computing structure as claimed in claim 8 , wherein said messaging system further comprises: a plurality of receiver devices for receiving message packets from a network, a network reception queue associated with a receiver device, each network reception queue adapted to buffer said received packet, a reception control unit for receiving information from a processor at a node for handling of packets received over a network; and, a plurality of parallel distributed reception messaging engine units (rMEs) each providing a multi-channel direct memory access (DMA) function, a reception messaging engine unit operatively connected with the reception control unit, said reception messaging engine unit initiates transfer of the received packet directly to a location in the associated memory system, wherein each associated reception message engine unit is coupled with an interconnect device having ports adapted for providing a connection to said interconnect device, 10 . The massively parallel computing structure as claimed in claim 8 , wherein said messaging system transfers blocks via one or more switch master por
with prefetch · CPC title
using pseudo-associative means, e.g. set-associative or hashing · CPC title
Details relating to cache prefetching · CPC title
using a cache · CPC title
Multiplexed DMA (G06F13/30 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.