Compiler technology has enabled the software advances of the last sixty years. It has given us machine-independent programming and improved productivity by automatically handing a number of issues, such as instruction selection and register allocation. However, in the parallel world of high performance computing, the impact of compiler technology has been small. Part of the reason is that the ambitious research projects of the last few decades, such as automatic parallelization and automatic generation of distributed memory programs à la High Performance Fortran, are yet to produce useful results. The absence of effective compiler technology has resulted in lack of portability and low productivity in the programming of parallel machines. With these problems growing more serious, due to the popularization of parallelism and the complexity increase expected in future high-end machines, advances in compiler technology are now more important than ever. In this presentation, I will discuss the state of the long standing problem of automatic parallelization and describe new important lines of research such as the identification of levels of abstractions that help both productivity and compilation, the development of a solid understanding of the automatic optimization process, the creation of a research methodology to enable the quantification of progress, and the development of an effective methodology for the interaction of programmers with compilers.
David Padua is the Donald Biggar Willet Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he has been a faculty member since 1985. His areas of interest include compilers, software tools, and parallel computing. He has published more than 170 papers and has supervised the dissertations of more than 30 PhD students. Padua has served as a program committee member, program chair, or general chair for more than 70 conferences and workshops. He was the Editor-in-Chief of Springer‐Verlag’s Encyclopedia of Parallel Computing and is a member of the editorial board of the IEEE Transactions of Parallel and Distributed Systems, the Journal of Parallel and Distributed Computing, and the International Journal of Parallel Programming. He received the 2015 IEEE Computer Society Harry H. Goode Award and is a fellow of the ACM and the IEEE.
Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. Her research is in programming languages, compilers, parallel algorithms, and automatic performance tuning. She is well known for her work in Partitioned Global Address Space languages, including co-inventing the Unified Parallel C (UPC) and Titanium languages. She and her students developed program analyses and optimization techniques for these languages and the Berkeley Lab team built compiler and runtime support that is used by several other research and production projects. She also led the Sparsity project, the first automatically tuned library for sparse matrix kernels, and she co-led the development of the Optimized Sparse Kernel Interface (OSKI). She has worked on interdisciplinary teams developing scientific applications ranging from simulations of chemistry, fusion, and blood flow in the heart to analysis problems in phylogenetics and genome assembly.
Yelick is an ACM Fellow and recent recipient of the ACM/IEEE Ken Kennedy award and the ACM-W Athena award. She is a member of the National Academies Computer Science and Telecommunications Board (CSTB) and the Computing Community Consortium (CCC), and she previously served on the California Council on Science and Technology and the LLNS/LANS Science and Technology Committee overseeing research at Los Alamos and Lawrence Livermore National Laboratories.
In 2012, IEEE formed a cross-societal initiative, the IEEE Rebooting Computing Initiative (RCI), and tasked it with fundamentally rethinking computing from devices and circuits up through architectures, languages and algorithms. RCI has held 4 invitation-only summits of thought leaders that have influenced US policy decisions. Starting in 2014, RCI joined forces with the International Technology Roadmap of Semiconductors (also known as “the Semiconductor Roadmap”). In 2016, ITRS moved to IEEE and became the IEEE International Roadmap of Devices and Systems (IRDS). The goal of the IRDS is to roadmap future computing and drive its requirements down to devices to guide the semiconductor industry. I will discuss both the IEEE RCI summits and the IRDS structure and goals.
Tom Conte is Professor of CS and ECE at Georgia Institute of Technology, where he directs the interdisciplinary Center for Research into Novel Computing Hierarchies. Since 2012, Tom has co-chaired (along with Elie Track) the IEEE-wide Rebooting Computing Initiative that has as its goal to entirely rethink how we compute, from algorithms down to semiconductor devices. He is also the vice chair of the IEEE International Roadmap of Devices and Systems (the successor to the International Technology Roadmap of Semiconductors). He travels around the world giving talks about how shifts in technology and the slowing of Moore’s Law are about to cause a dramatic shift in how we compute. Tom is the past president of the IEEE Computer Society and a Fellow of the IEEE.
After several decades were application programs and system architecture were decoupled by overall relatively clean ISA the last decade has seen an increasing leakage across that interface. The advent of multicores has been accompanied by a growing amount of internals of system architecture to be explicitly visible to the programmer. Although it can be shown on simple examples that very important performance gains can be achieved on very specific architectures after a significant programming effort, this is hardly sustainable and productivity and maintainability of the codes is becoming a really important issue.
The StarSs project at BSC promotes the vision that we need to reintroduce a clean interface between the programmer and the system that allows the first one to express data and computational aspects of algorithms and ideas while leaving to the system the responsibility of maximizing the usage of the architectural resources. The basic concept is to provide within otherwise traditional sequential programming languages the same capabilities that superscalar processors provided for what otherwise was a sequential ISA. By raising the abstraction level from instructions to tasks, many techniques used at the microarchitectural level can be leveraged. Also many new opportunities arise when considering the coarser granularity level of task based models. Today, the implementation of the system under the task based interface is a mixture of architecture aware runtime software plus the available hardware architecture. The runtime is responsible of hiding memory architectures and heterogeneity from the programmer as well as optimizing the schedule of computations. Nevertheless, this vision opens a huge research opportunity for runtime aware architectures that will provide support mechanism to help the runtime and in the long term may result in a very tight fusion between both.
The talk will elaborate on this vision and although there is still a lot of research to be done and best practices to promote, we will present some examples based on the OmpSs incarnation of the StarSs concept that demonstrate how this vision can be implemented and its potential benefits.
Jesus Labarta is full professor on Computer Architecture at the Technical University of Catalonia (UPC) since 1990. Since 1981 he has been lecturing on computer architecture, operating systems, computer networks and performance evaluation. His research interest has been centered on parallel computing, covering areas from multiprocessor architecture, memory hierarchy, programming models, parallelizing compilers, operating systems, parallelization of numerical kernels, performance analysis and prediction tools.
Since 2005 he is responsible of the Computer Science Research Department within the Barcelona Supercomputing Center (BSC). He has been involved in research cooperation with many leading companies on HPC related topics. His major directions of current work relate to performance analysis tools, programming models and resource management. His team distributes the Open Source BSC tools (Paraver and Dimemas) and performs research on increasing the intelligence embedded in the performance analysis tools. He is involved in the development of the OmpSs programming model and its different implementations for SMP, GPUs and cluster platforms. He has been involved in Exascale activities such as IESP and EESI where he has been responsible of the Runtime and Programming model sections of the respective Roadmaps. He leads the programming models and resource management activities in the HPC subproject of the Human Brain Project.