Research Activities in the Vortex Group:

Current research activities within the Vortex research group include:
  • Compiler-Based Pre-Execution. Recently, pre-execution of critical instructions, such as cache-missing loads and hard-to-predict branches, has received significant attention from within the architecture community. A critical issue for pre-execution is generating effective pre-execution code. Since this process is labor-intensive and prone to human error, automating code generation is an important area of research. We are building a compiler for pre-execution that extracts pre-execution code automatically via static analysis of program source code. Our research focuses on developing novel compiler algorithms that increase the ability of pre-execution threads to get ahead of the main computation thread. Such algorithms include program slicing, prefetch conversion, and speculative loop parallelization.

  • Transparent Threads. Simultaneous Multithreading (SMT) processors increase overall processor throughput by issuing instructions from multiple threads simultaneously. Unfortunately, the boost in overall throughput comes at the expense of single-thread performance. To support multithreading applications that are sensitive to latency, we are developing new hardware resource sharing mechanisms for SMT processors that maintain single-thread performance of designated high-priority threads as much as possible, while still permitting other low-priority threads to utilize idle resources when available. In such an SMT processor, the low-priority threads never impact the performance of the high-priority threads, and hence are transparent. Current research focuses on applications of transparent threads, such as running low-priority processes in a multiprogrammed workload and implementing highly speculative sub-ordinate threading optimizations.

  • Simultaneous Dynamic Compilation. Several researchers are studying dynamic compilation techniques to either translate binaries for platform compatibility, or optimize code for performance. A critical issue in such systems is the runtime overhead of the dynamic compiler. This consideration can limit the frequency of compiler invocation or the aggressiveness of compiler optimizations. We are investigating architecture support to mitigate the cost of runtime compilation. First, we are studying the benefits of performing dynamic compilation in helper threads that run simultaneous with computation threads in an SMT processor. Second, we are using transparent threading mechanisms (see above) for dynamic compiler threads to further reduce their overhead. Our goal is to enable the use of dynamic compilers all the time, and to investigate new optimization algorithms that become possible when the cost of compilation is near zero.

  • Software-Managed Memory Bandwidth. As processor performance continues to increase and latency tolerance techniques become widespread to address the growing processor-memory performance gap, memory bandwidth will emerge as a critical performance bottleneck. One approach to mitigate a memory bandwidth shortage is to more intelligently fetch data from memory to cache. Our studies show as much as 50% to 75% of data fetched into the L2 cache is never accessed by the processor due to sparse memory reference patterns. We are developing compiler support and hardware mechanisms to enable applications to inform the memory system of its memory reference patterns. Using application-level information, the memory system can fetch exactly the data needed by the processor on every cache miss, thus avoiding the fetch of useless data and reducing the memory bandwidth requirements of programs.

  • Pointer Prefetching. We are developing prefetching techniques for pointer-intensive computations commonly found in non-numeric applications. Data dependences arising from pointer indirection require pointer-chasing memory references to perform sequentially. Known as the pointer-chasing problem, such memory serialization effects prevent conventional prefetching techniques from overlapping multiple cache misses simultaneously. The new techniques under investigation identify memory parallelism between independent pointer chains, and exploit this new-found memory parallelism to improve the effectiveness of prefetching.
  • Last updated: June 2000 by Donald Yeung (yeung@eng.umd.edu)