Computer systems have undergone a fundamental transformation recently, from single-core processors to devices with increasingly higher core counts within a single chip. The semi-conductor industry now faces the infamous power and utilization walls. To meet these challenges, heterogeneity in design, both at the architecture and technology levels, will be the prevailing approach for energy efficient computing as specialized cores, accelerators, etc., can eliminate the energy overheads of general-purpose homogeneous cores. However, with future technological challenges pointing in the direction of on-chip heterogeneity, and because of the traditional difficulty of parallel programming, it becomes imperative to produce new system software stacks that can take advantage of the heterogeneous hardware. As a case in point, the core count per chip continues to increase dramatically while the available on-chip memory per core is only getting marginally bigger. Thus, data locality, already a must-have in high-performance computing, will become even more critical as memory technology progresses. In turn, this makes it crucial that new execution models be developed to better exploit the trends of future heterogeneous computing in many-core chips. To solve these issues, we propose a cross-cutting cross-layer approach to address the challenges posed by future heterogeneous many-core chips.
Short Bio: Jean-Luc Gaudiot received the Diplôme d’Ingénieur from ESIEE, Paris, France in 1976 and the M.S. and Ph.D. degrees in Computer Science from UCLA in 1977 and 1982, respectively. He is currently Professor in the Electrical Engineering and Computer Science Department at UC, Irvine. Prior to joining UCI in 2002, he was Professor of Electrical Engineering at the University of Southern California since 1982. His research interests include multithreaded architectures, fault-tolerant multiprocessors, and implementation of reconfigurable architectures. He has published over 250 journal and conference papers. His research has been sponsored by NSF, DoE, and DARPA, as well as a number of industrial companies. He has served the community in various positions and was just elected to the presidency of the IEEE Computer Society for 2017.
Parallel computers have come of age and need parallel software to justify their usefulness. There are two major avenues to get programs to run in parallel: parallelizing compilers and parallel languages and/or libraries. This talk presents our latest results using both approaches and draw some conclusions about their relative effectiveness and potential.
In the first part we introduce the Hybrid Analysis (HA) compiler framework that can seamlessly integrate static and run-time analysis of memory references into a single framework capable of full automatic loop level parallelization. Experimental results on 26 benchmarks show full program speedups superior to those obtained by the Intel Fortran compilers.
In the second part of this talk we present the Standard Template Adaptive Parallel Library (STAPL) based approach to parallelizing code. STAPL is a collection of generic data structures and algorithms that provides a high productivity, parallel programming infrastructure analogous to the C++ Standard Template Library (STL). In this talk, we provide an overview of the major STAPL components with particular emphasis on graph algorithms. We then present scalability results of real codes using peta scale machines such as IBM BG/Q and Cray.
Short Bio: Lawrence Rauchwerger is the Eppright Professor of Computer Science and Engineering at Texas A&M University and the co-Director of the Parasol Lab. He received an Engineer degree from the Polytechnic Institute Bucharest, a M.S. in Electrical Engineering from Stanford University and a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. He has held Visiting Faculty positions at the University of Illinois,
Bell Labs, IBM T.J. Watson, and INRIA, Paris.
Rauchwerger’s approach to auto-parallelization, thread-level speculation and parallel code development has influenced industrial products at corporations such as IBM, Intel and Sun. Rauchwerger is an IEEE Fellow, an NSF CAREER award recipient and has chaired various IEEE and ACM conferences, most recently serving as Program Chair of PACT 2016 and PPoPP 2017.
In the first half of this talk, an overview of the two main driving forces behind these new trends namely Data Centers and Autonomous Vehicles are given.
In the second half of the talk, some of the related topics that we have been focusing on are described.
Short Bio: Nader Bagherzadeh is a professor of computer engineering in the department of electrical engineering and computer science at the University of California, Irvine, where he served as a chair from 1998 to 2003. Dr. Bagherzadeh has been involved in research and development in the areas of: computer architecture, reconfigurable computing, VLSI chip design, network-on-chip, 3D chips, sensor networks, computer graphics, memory and embedded systems, since he received a Ph.D. degree from the University of Texas at Austin in 1987. He is a Fellow of the IEEE.
Professor Bagherzadeh has published more than 300 articles in peer-reviewed journals and conferences. His former students have assumed key positions in software and computer systems design companies in the past thirty years. He has been a PI or Co-PI of research grants for developing all aspects of next generation computer systems for embedded systems as well as general purpose computing.
This talk evolved from a keynote I was asked to give at the first “Pioneering Processor Paradigms” Workshop at HPCA last February. The goal of the workshop was to look at key paradigms of the past to see what we could learn from them. The results are clear. Paradigms are proposed to solve problems at higher performance, do not quite fulfill their promise, and get replaced by newer paradigms that purport to do it, …which, in turn are replaced by still newer paradigms, etc. For the most part our field has grown by evolving one paradigm into the next, continuing to ride the performance wave that the dreamers want to take advantage of. What is also clear is that we have also needed to (and will need to even more so as Moore’s Law comes to an end) lean on researchers working in other levels of the Transformation Hierarchy if we are to continue to ride the crest of improved performance. However, when I point out some examples of evolved processor paradigms that require help from the entire transformation hierarchy, I immediately get pushback from critics who argue that portability will go out the window, and economics will prevent industry from getting on board. The naysayers are of course correct, my suggestions do hurt portability. But as for economics, I say, “Economics be damned!” I will explain why I believe this position makes sense.
Short bio: Yale Patt is a teacher at the local public university in Austin, Texas. He enjoys teaching the intensive required intro to computing course to 400 freshmen every other Fall, the advanced graduate course in microarchitecture every other Spring, and the senior-level computer architecture course whenever they let him. He also enjoys his visits to Brazil, and participating from time to time in SBAC, Brazil’s increasingly important computer conference. Dr. Patt has earned appropriate degrees from reputable universities and has received more than enough awards for his research and teaching. More detail can be found on his website.