Programmable and Energy Efficient Extreme-Scale Processors


Today, integrating 12-16 state-of-the-art cores or 10s of smaller cores on a single chip is commonplace. Since Moore's Law scaling is expected to continue for the forseeable future, processors with 1000+ cores will become possible in the future. This project is investigating techniques for supporting such extreme-scale processors. A major focus of the project is to develop new evaluation methodologies, such as multicore reuse distance analysis, for rapidly assessing extreme-scale processors. (Click here for more details on multicore RD analysis). Another focus of the project is to develop software and architectural support for extreme-scale processors, such as cache management and reconfiguration techniques, locality optimizations, and implicit synchronization techniques. Recently, the project has also begun looking at heterogeneous microprocessors in which both CPU cores and GPU cores are integrated on the same chip.



  • Donald Yeung
  • Students

  • Daniel Gerzhoy
  • Xiaowu Sun
  • Alumni

  • Mike Badamo
  • Abdel-Hameed A. Badawy
  • Jeff Casarona
  • Inseok Choi
  • Wanli Liu
  • Lisa Stechschulte
  • Meng-Ju Wu
  • Xu Yang
  • Minshu Zhao
  • Mike Zuzak
  • Publications:

  • Michael Zuzak and Donald Yeung. Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors. In Proceedings of the 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with HiPEAC-12. Stockholm, Sweden. January 2017. Best paper award. (pdf)

  • Minshu Zhao and Donald Yeung. Using Multicore Reuse Distance to Study Coherence Directories. To appear in ACM Transactions on Computer Systems.

  • Abdel-Hameed A. Badawy and Donald Yeung. Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis. IEEE Computer Architecture Letters. April 2017.
    (IEEE digital library distribution)

  • I. Stephen Choi and Donald Yeung. Multi-Cache Resizing via Greedy Coordinate Descent. Journal of Supercomputing. Vol. 73, No. 6. pp. 2402-2429. June 2017.
    (Springer digital library distribution)

  • Mike Badamo, Jeff Casarona, Minshu Zhao, and Donald Yeung. Identifying Power Efficient Multicore Cache Hierarchies via Reuse Distance Analysis. ACM Transactions on Computer Systems. Vol. 34, No. 1. Article 3. pp. 1-30. April 2016. (pdf)

  • Meng-Ju Wu, Minshu Zhao, and Donald Yeung. Studying Multicore Processor Scaling via Reuse Distance Analysis. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA-XL). Tel-Aviv, Israel. June 2013. (pdf, gzip'd postscript)

  • Meng-Ju Wu and Donald Yeung. Identifying Optimal Multicore Cache Hierarchies for Loop-based Parallel Programs via Reuse Distance Analysis. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC-2012). Beijing, China. June 2012. (pdf)

  • Meng-Ju Wu and Donald Yeung. Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs. In ACM Transactions on Computer Systems. Vol. 31, No. 1. Article 1. pp. 1-37. February 2013. (pdf)

  • Eric Lau, Jason Miller, Inseok Choi, Donald Yeung, Saman Amarasinghe, and Anant Agarwal. Multicore Performance Optimization Using Partner Cores. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar '11). Berkeley, CA. May 2011. (pdf)

  • Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor. IEEE Computer Architecture Letters. Vol 10, No 2. July-December 2011. (pdf, gzip'd postscript)

  • Wanli Liu and Donald Yeung. Using Aggressor Thread Information to Improve Shared Cache Management for CMPs. In Proceedings of the 18th International Conference on Parallel Architectures and Compiler Techniques. Raleigh, NC. September 2009. (pdf, gzip'd postscript)

  • ACM permission notice:
    The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

    ACM copyright notice:
    Copyright 2013 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Send written requests for republication to ACM Publications, Copyright & Permissions at the address above or fax +1 (212) 869-0481 or email For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.


  • This project is funded in part by the National Science Foundation under grants #CCF-1117042 and #CCF-1618963, in part by the Defense Advanced Research Projects Agency under contracts #HR0011-10-9-0009 and #HR0011-13-2-0005, and in part by the Naval Reconnaissance Office.
  • Last updated: July 2017 by Donald Yeung (