Programmable and Energy Efficient Extreme-Scale Processors

Overview:

Today, integrating 4--8 state-of-the-art cores or 10s of smaller cores on a single chip is commonplace. Since Moore's Law scaling is expected to continue for the forseeable future, processors with as many as 1000 cores will become possible within a few processor generations. This project is investigating programming and architectural support for such extreme-scale processors. Recent areas of research include reuse distance analysis for evaluating extreme-scale processors, scalability of processors out to extreme scale, cache management techniques, locality optimizations, implicit synchronization, and techniques for power efficiency.

People:

Faculty

  • Donald Yeung
  • Students

  • Abdel-Hameed A. Badawy
  • Inseok Choi
  • Lisa Stechschulte
  • Meng-Ju Wu
  • Minshu Zhao
  • Alumni

  • Wanli Liu
  • Xu Yang
  • Publications:

  • Meng-Ju Wu and Donald Yeung. Identifying Optimal Multicore Cache Hierarchies for Loop-based Parallel Programs via Reuse Distance Analysis.To appear in Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC-2012). Beijing, China. June 2012.
  • Meng-Ju Wu and Donald Yeung. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques. Galveston Island, TX. October 2011. (pdf, gzip'd postscript)

  • Eric Lau, Jason Miller, Inseok Choi, Donald Yeung, Saman Amarasinghe, and Anant Agarwal. Multicore Performance Optimization Using Partner Cores. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar '11). Berkeley, CA. May 2011. (pdf)

  • Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor. IEEE Computer Architecture Letters. Vol 10, No 2. July-December 2011. (pdf, gzip'd postscript)

  • Wanli Liu and Donald Yeung. Using Aggressor Thread Information to Improve Shared Cache Management for CMPs. In Proceedings of the 18th International Conference on Parallel Architectures and Compiler Techniques. Raleigh, NC. September 2009. (pdf, gzip'd postscript)

  • ACM permission notice:
    The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

    ACM copyright notice:
    Copyright (c) 2000 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.

    Funding:

  • This project is funded in part by the National Science Foundation under grant #CCF-1117042, in part by the Defense Advanced Research Projects Agency under contract #HR0011-10-9-0009, and in part by the Naval Reconnaissance Office.
  • Last updated: May 2012 by Donald Yeung (yeung@ece.umd.edu)