Programmable and Energy Efficient Extreme-Scale Processors
Overview:
Today, integrating 12-16 state-of-the-art cores or 10s of smaller
cores on a single chip is commonplace. Since Moore's Law scaling is
expected to continue for the forseeable future, processors with 1000+
cores will become possible in the future. This project is
investigating techniques for supporting such extreme-scale processors.
A major focus of the project is to develop new evaluation
methodologies, such as multicore reuse distance analysis, for rapidly
assessing extreme-scale processors.
(Click
here for more details on multicore RD analysis). Another focus of
the project is to develop software and architectural support for
extreme-scale processors, such as cache management and reconfiguration
techniques, locality optimizations, and implicit synchronization
techniques. Recently, the project has also begun looking at
heterogeneous microprocessors in which both CPU cores and GPU cores
are integrated on the same chip.
People:
Faculty
Alumni
Publications:
- Earlier Related Tech Report:
Daniel Gerzhoy and Donald Yeung. Pipelined CPU-GPU
Scheduling for Caches. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2021-01. March 2021.
(pdf)
Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, and Donald
Yeung. Nested MIMD-SIMD Parallelization for Heterogeneous
Microprocessors. ACM Transactions on Architecture and
Code Optimization. Vol. 16, No. 4, Article 48. December
2019.
(ACM
digital library distribution)
- Earlier Related Workshop Paper:
Michael Zuzak and Donald Yeung. Exploiting Multi-Loop
Parallelism on Heterogeneous Microprocessors.
In Proceedings of the 10th International Workshop on
Programmability and Architectures for Heterogeneous Multicores
(MULTIPROG-2017), held in conjunction with HiPEAC-12.
Stockholm, Sweden. January 2017. Best paper award.
(pdf)
- Earlier Related Tech Report:
Michael Zuzak and Donald Yeung. Exploiting Multi-Loop
Parallelism on Heterogeneous
Microprocessors. University of Maryland Institute for
Advanced Computer Studies Technical Report,
UMIACS-TR-2016-01.
(pdf)
Minshu Zhao and Donald Yeung. Using Multicore Reuse Distance
to Study Coherence Directories. ACM Transactions on
Computer Systems. Vol. 35, No. 2. Article 4. October 2017.
(ACM digital
library distribution)
- Earlier Related Conference Paper:
Minshu Zhao and Donald Yeung. Studying the Impact of
Multicore Processor Scaling on Directory Techniques via Reuse
Distance Analysis. In Proceedings of the 21st
International Symposium on High Performance Computer
Architecture (HPCA-XXI). San Francisco Bay Area,
CA. February 2015. (pdf,
gzip'd postscript)
-
Earlier Related Tech Report:
Minshu Zhao and Donald Yeung. Studying Directory Access
Patterns via Reuse Distance Analysis and Evaluating Their Impact
on Multi-Level Directory Caches. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2014-01. January 2014.
(pdf)
Abdel-Hameed A. Badawy and Donald Yeung. Optimizing Locality
in Graph Computations using Reuse Distance Profiles.
In Proceedings of the 36th International Performance Computing
and Communications Conference. San Diego, CA. December
2017.
(IEEE digital library distribution)
- Earlier Related Journal Paper:
Abdel-Hameed A. Badawy and Donald Yeung. Guiding Locality
Optimizations for Graph Computations via Reuse Distance Analysis.
IEEE Computer Architecture Letters. Vol. 16, Issue
2. pp. 119-122. July - December 2017.
(IEEE
digital library distribution)
I. Stephen Choi and Donald Yeung. Multi-Cache Resizing via
Greedy Coordinate Descent. Journal of
Supercomputing. Vol. 73, No. 6. pp. 2402-2429. June
2017.
(Springer digital library distribution)
- Earlier Related Tech Report:
Inseok Choi and Donald Yeung. Symbiotic Cache Resizing for
CMPs with Shared LLC. University of Maryland Institute
for Advanced Computer Studies Technical Report,
UMIACS-TR-2013-02. September
2013. (pdf)
Mike Badamo, Jeff Casarona, Minshu Zhao, and Donald
Yeung. Identifying Power Efficient Multicore Cache
Hierarchies via Reuse Distance Analysis. ACM
Transactions on Computer Systems. Vol. 34, No. 1.
Article 3. pp. 1-30. April 2016.
(pdf)
Meng-Ju Wu, Minshu Zhao, and Donald Yeung. Studying Multicore
Processor Scaling via Reuse Distance Analysis.
In Proceedings of the 40th International Symposium
on Computer Architecture (ISCA-XL). Tel-Aviv, Israel.
June 2013. (pdf,
gzip'd postscript)
Meng-Ju Wu and Donald Yeung. Identifying Optimal Multicore
Cache Hierarchies for Loop-based Parallel Programs via Reuse Distance
Analysis. In Proceedings of the ACM SIGPLAN Workshop on
Memory Systems Performance and Correctness (MSPC-2012).
Beijing, China. June
2012. (pdf)
- Earlier Related Tech Report:
Meng-Ju Wu and Donald Yeung. Understanding Multicore Cache
Behavior of Loop-based Parallel Programs via Reuse Distance
Analysis. University of Maryland Institute for Advanced
Computer Studies Technical Report, UMIACS-TR-2012-01. January
2012. (pdf)
Meng-Ju Wu and Donald Yeung. Efficient Reuse Distance Analysis
of Multicore Scaling for Loop-based Parallel Programs.
In ACM Transactions on Computer Systems. Vol. 31,
No. 1. Article 1. pp. 1-37. February 2013.
(pdf)
- Earlier Related Conference Paper:
Meng-Ju Wu and Donald Yeung. Coherent Profiles: Enabling
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based
Parallel Programs. In Proceedings of the 20th International
Conference on Parallel Architectures and Compilation
Techniques. Galveston Island, TX. October 2011.
(pdf,
gzip'd postscript)
- Earlier Related Tech Report:
Meng-Ju Wu and Donald Yeung. Memory Performance Analysis for
Parallel Programs Using Concurrent Reuse Distance.
University of Maryland Institute for Advanced Computer Studies
Technical Report, UMIACS-TR-2010-10. October 2010.
(pdf)
Eric Lau, Jason Miller, Inseok Choi, Donald Yeung, Saman
Amarasinghe, and Anant Agarwal. Multicore Performance Optimization
Using Partner Cores. In Proceedings of the 3rd USENIX
Workshop on Hot Topics in Parallelism (HotPar '11). Berkeley,
CA. May 2011. (pdf)
Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung.
Experience with Improving Distributed Shared Cache Performance on
Tilera's Tile Processor. IEEE Computer Architecture
Letters. Vol 10, No 2. July-December 2011.
(pdf, gzip'd
postscript)
- Earlier Related Workshop Paper:
Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Early
Experience with Profiling and Optimizing Distributed Shared Cache
Performance on Tilera's Tile Processor. In Proceedings of
the 6th International Workshop on Unique Chips and Systems.
Atlanta, GA. December 2010.
One of 2 best papers out of 12 papers appearing in the workshop. (pdf, gzip'd postscript)
Wanli Liu and Donald Yeung. Using Aggressor Thread Information
to Improve Shared Cache Management for CMPs. In Proceedings
of the 18th International Conference on Parallel Architectures and
Compiler Techniques. Raleigh, NC. September 2009. (pdf, gzip'd postscript)
- Earlier Related Tech Report:
Wanli Liu and Donald Yeung. Probabilistic Replacement:
Enabling Flexible Use of Shared Caches for CMPs.
University of Maryland Institute for Advanced Computer
Studies Technical Report, UMIACS-TR-2008-13. July 2008.
(pdf)
ACM permission notice:
The documents contained in these directories are included by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a non-commercial basis. Copyright and
all rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here
electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the
explicit permission of the copyright holder.
ACM copyright notice:
Copyright © 2013 by the Association for Computing Machinery,
Inc. (ACM). Permission to make digital or hard copies of portions of
this work for personal or classroom use is granted without fee
provided that the copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and the full
citation on the first page in print or the first screen in digital
media. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy
otherwise, to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and/or a fee. Send written
requests for republication to ACM Publications, Copyright &
Permissions at the address above or fax +1 (212) 869-0481 or email
permissions@acm.org. For other copying of articles that carry a code
at the bottom of the first or last page, copying is permitted provided
that the per-copy fee indicated in the code is paid through the
Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
Funding:
Last updated: August 2021 by Donald Yeung (yeung@umd.edu)