Are you ready for the future of high performance computing? Is your application performance portable? The scale and complexity of high end systems is increasing, nodes are becoming more parallel with many processors per node, more threads per processor, longer vector lengths, more complex memory hierarchies, and potentially heterogeneous processing elements. These technology changes in the supercomputing industry are forcing computational scientists to address new critical system characteristics that will significantly impact the performance and scalability of applications. These considerations will require a paradigm shift in application development. One main change is that the dominant programming model of parallelism through only message passing will not be feasible on this new generation of high performance systems. Application developers will have to hybridize their codes, adding multiple levels of parallelism. In addition, since these systems may have heterogeneous processors and multiple levels of the memory hierarchy, application developers may also have to introduce pragmas or directives for better node utilization and performance portability across a wide range of systems. In this tutorial I will discuss these trends in the supercomputing industry, including programming paradigms and tools to support porting and tuning efforts, and will also discuss some of the challenges and open research problems that need to be addressed to create applications and build system software for the new generation of high performance computing systems.
Dr. Luiz DeRose is a Senior Principal Engineer and the Programming Environments Director at Cray Inc, where he is responsible for the programming environment strategy for all Cray systems. Before joining Cray in 2004, he was a research staff member and the Tools Group Leader at the Advanced Computing Technology Center at IBM Research. Dr. DeRose had a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. With more than 20 years of high performance computing experience and a deep knowledge of its programming environments, he has published more than 50 peer-review articles in scientific journals, conferences, and book chapters, primarily on the topics of compilers and tools for high performance computing. Dr. DeRose participated in the definition and creation of the OpenACC standard for accelerators high level programming. He is the Organizing and Program Committee co-chair of the 10th International Workshop on OpenMP (IWOMP) in 2014, was the Global Chair for the Multicore and Manycore Programming topic in Euro-Par 2013, and was the Program Committee co-Chair at the 21st International Conference on Parallel Architecture and Compilation Techniques PACT-2012.
Programming large supercomputers presents several challenges: exposing concurrency, controlling load imbalance, tolerating failures, among others. Addressing these challenges requires an emphasis on important concepts during application development: overdecomposition, asynchrony, migratability, and adaptivity. This tutorial presents Charm++, a programming paradigm that encapsulates these ideas. Charm++ provides an asynchronous, message-driven programming model via parallel objects and an adaptive runtime system that guides execution. It automatically overlaps communication and computation, balances loads, tolerates failures, checkpoints for split-execution, and promotes modularity while allowing programming in C++. Several widely used Charm++ applications thrive in computational science domains including biomolecular modeling and cosmology. The approach followed in this tutorial provides a guide for migrating applications from the reigning parallel programming paradigm (MPI) to Charm++.
Dr. Esteban Meneses is an assistant professor at the Costa Rica Institute of Technology. His research interests include fault tolerance and load balancing for large-scale systems. He works on energy-efficient low- overhead techniques for fault tolerance. He has developed message-logging protocols that exploit application characteristics to reduce the total memory footprint of the message log. He holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. He received a bachelor’s and master’s degree in Computer Science from the Costa Rica Institute of Technology.
Dr. Celso L. Mendes has been at the National Center for Supercomputing Applications at Illinois, where he participated in the Blue Waters deployment project. He is currently a senior technologist at the National Institute for Space Research, in Brazil. He has worked on performance analysis tools, techniques for parallel systems, and applications of adaptive runtime systems for large parallel machines. He received a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign and received both an Engineer and a Master of Engineering degrees from the Aeronautics Technology Institute (ITA), in Brazil.
Dr. Laércio Lima Pilla holds an associate professor position in the Federal University of Santa Catarina,Brazil. He obtained his Ph.D. degree in Computer Science in 2014 in a joint doctorate between the Federal University of Rio Grande do Sul, Brazil, and the University of Grenoble, France. His research interests focus on topology-aware scheduling and software-based fault-tolerance on GPUs.