Vortex: Irregular Data Stream Support for Data-Intensive Applications Donald Yeung, Nicholas Kohout, Sujata Ramasubramanian, Ilya Khazanov, and Rishi Kurichh University of Maryland Systems and Computer Architecture Group http://www.ee.umd.edu/~yeung/vortex As the gap between processor speed and memory speed continues to widen, application performance becomes increasingly limited by the memory system. The applications most sensitive to this well-known "memory gap" are those with working sets that are too large to fit in cache. Such data-intensive applications tend to run "out of cache" and thus experience high memory stall overheads. Several techniques have been proposed to tolerate memory latency such as dynamic instruction scheduling, prefetching, and stream buffers; however, such traditional techniques become less effective for applications with sparse and irregular memory access patterns, or for applications that traverse recursive data structures by chasing pointers. We introduce Vortex, a memory system architecture that uses application-specific information about the layout of data objects in memory to actively move data to the processor. The crux of Vortex lies in a flexible data streaming technique called Multi-Dimensional Streams (MDS). MDS allows the programmer or compiler to compactly describe to the memory system a stream of memory addresses that the processor will make to a data structure. Each address stream is specified as an N-dimensional object consisting of N (base, length, stride) descriptors. Furthermore, the MDS descriptors also allow the specification of memory indirection, thus enabling the description of address streams that arise from traversing recursive data structures. At runtime, under the control of special prefetch instructions, the application can selectively activate different MDS descriptors to initiate prefetching in the memory controller. The ability to generate irregular address streams and perform prefetching from within the memory system has several advantages. First, it provides the application with a "smart prefetch" capability. In Vortex, the application can prefetch a large amount of data by executing a small number of instructions thus reducing the runtime cost of prefetching. Second, the "round trip" time between the issue of a prefetch and the arrival of the data is minimized because the prefetch is performed closer to the physical DRAMs. This reduces the critical path latency necessary to traverse a chain of pointers. Third, address generation in Vortex is decoupled from computation in the main CPU. This allows the "prefetch distance" between the memory controller and the main CPU to be adjusted dynamically. Lastly, the Vortex prefetcher has access to the raw DRAM interface; therefore, it can dynamically reorder prefetch accesses to draw long streams out of DRAM pages when spatial locality in the address stream exists. In this talk, we will present Vortex, describe how applications specify MDS descriptors, and discuss how applications can apply Vortex-style streaming to improve the performance of irregular data-intensive computations. We will also describe the architectural support needed in Vortex, both in the memory controller and in the processor stream buffers. Finally, we will report preliminary simulation results that demonstrate the performance gains possible on Vortex.