Vortex:  Irregular Data Stream
	       Support for Data-Intensive Applications

		    Donald Yeung, Nicholas Kohout,
       Sujata Ramasubramanian, Ilya Khazanov, and Rishi Kurichh
			University of Maryland
	       Systems and Computer Architecture Group

		 http://www.ee.umd.edu/~yeung/vortex


As the gap between processor speed and memory speed continues to
widen, application performance becomes increasingly limited by the
memory system.  The applications most sensitive to this well-known
"memory gap" are those with working sets that are too large to fit in
cache.  Such data-intensive applications tend to run "out of cache"
and thus experience high memory stall overheads.  Several techniques
have been proposed to tolerate memory latency such as dynamic
instruction scheduling, prefetching, and stream buffers; however, such
traditional techniques become less effective for applications with
sparse and irregular memory access patterns, or for applications that
traverse recursive data structures by chasing pointers.

We introduce Vortex, a memory system architecture that uses
application-specific information about the layout of data objects in
memory to actively move data to the processor.  The crux of Vortex
lies in a flexible data streaming technique called Multi-Dimensional
Streams (MDS).  MDS allows the programmer or compiler to compactly
describe to the memory system a stream of memory addresses that the
processor will make to a data structure.  Each address stream is
specified as an N-dimensional object consisting of N (base, length,
stride) descriptors.  Furthermore, the MDS descriptors also allow the
specification of memory indirection, thus enabling the description of
address streams that arise from traversing recursive data structures.
At runtime, under the control of special prefetch instructions, the
application can selectively activate different MDS descriptors to
initiate prefetching in the memory controller.

The ability to generate irregular address streams and perform
prefetching from within the memory system has several advantages.
First, it provides the application with a "smart prefetch" capability.
In Vortex, the application can prefetch a large amount of data by
executing a small number of instructions thus reducing the runtime
cost of prefetching.  Second, the "round trip" time between the issue
of a prefetch and the arrival of the data is minimized because the
prefetch is performed closer to the physical DRAMs.  This reduces the
critical path latency necessary to traverse a chain of pointers.
Third, address generation in Vortex is decoupled from computation in
the main CPU.  This allows the "prefetch distance" between the memory
controller and the main CPU to be adjusted dynamically.  Lastly, the
Vortex prefetcher has access to the raw DRAM interface; therefore, it
can dynamically reorder prefetch accesses to draw long streams out of
DRAM pages when spatial locality in the address stream exists.

In this talk, we will present Vortex, describe how applications
specify MDS descriptors, and discuss how applications can apply
Vortex-style streaming to improve the performance of irregular
data-intensive computations.  We will also describe the architectural
support needed in Vortex, both in the memory controller and in the
processor stream buffers.  Finally, we will report preliminary
simulation results that demonstrate the performance gains possible on
Vortex.