SBAC-PAD'2012 Accepted Papers
Kevin Kai-Wei Chang, Rachata Ausavarungnirun, Chris Fallin and Onur Mutlu.
Adaptive Heterogeneous Throttling for On-Chip Networks
Ardavan Pedram, Andreas Gerstlauer and Robert van de Geijn.
On the Efficiency of Register File versus Broadcast Interconnect for Collective Communications in Data-Parallel Hardware Accelerators
George Chin, Andres Marquez, Sutanay Choudhury and John Feo.
Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures
Marco Alves, Khubaib Khubaib, Eiman Ebrahimi, Veynu Narasiman, Carlos Villavieja, Philippe Navaux and Yale Patt.
Energy Savings via Dead Sub-Block Prediction
Luiz Ramos and Ricardo Bianchini.
Exploiting Phase-Change Memory in Cooperative Caches
Diogo Sampaio, Rafael Martins, Sylvain Collange and Fernando Pereira.
Divergence Analysis with Affine Constraints
Biswabandan Panda and Shankar Balachandran.
CSHARP: Coherence and SHaring Aware Replacement Policies for Parallel Applications
Muneeb Khan, Andreas Sembrant and Erik Hagersten.
Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles
Rance Rodrigues, Arunachalam Annamalai, Israel Koren and Sandip Kundu.
Scalable Thread Scheduling in Asymmetric Multicores for Power Efficiency
Teo Milanez, Sylvain Collange, Fernando Pereira and Wagner Meira.
Data and Instruction Uniformity in Minimal Multi-Threading
Vaibhav Sundriyal, Masha Sosonkina and Alexander Gaenko.
Runtime Procedure for Energy Savings in Applications with Point-to-point Communications
Alberto Ros, Ricardo Fernández-Pascual and Manuel E. Acacio.
Using Heterogeneous Networks to Improve Energy Efficiency in Direct Coherence Protocols for Many-Core CMPs
Peng Lu, Binoy Ravindran and Changsoo Kim.
VPC: Scalable, Low Downtime Checkpointing for Virtual Clusters
José Luis March, Salvador Petit, Julio Sahuquillo, Houcine Hassan and José Duato.
Efficiently Handling Memory Accesses to Improve QoS in Multicore Systems under Real-Time Constraints
Jiaxi Hu, Zhaosen Wang, Qiyuan Qiu, Weijun Xiao and David Lilja.
Sparse Fast Fourier Transform on GPUs and Multi-core CPUs
Jason Kane and Qing Yang.
Compression Speed Enhancements to Lepel-Ziv-Oberhumer for Multi-Core Systems
Joefon Jann, R. Sarma Burugula, Ching-Farn E. Wu and Kaoutar El Maghraoui.
An OS-Hypervisor Infrastructure for Automated OS Crash Diagnosis and Recovery in a Virtualized Environment
Murtaza Ali, Eric Stotzer, Francisco D. Igual and Robert A. van de Geijn.
Level-3 BLAS on the TI C6678 multi-core DSP
German Rodriguez, Cyriel Minkenberg, Ronald P. Luijten, Ramon Beivide, Patrick Geoffray, Jesus Labarta, Mateo Valero and Steve Poole.
The Network Adapter: The Missing Link between MPI Applications and Network Performance
Alessandro Morari, Antonino Tumeo, Simone Secchi, Oreste Villa and Mateo Valero.
Efficient Sorting on the Tilera Manycore Architecture
Alberto Sanz, Rafael Asenjo, Juan Lopez, Rafael Larrosa, Angeles Navarro and Vassily Litvinov.
Global data re-allocation via communication aggregation in Chapel
João Vicente Lima, Thierry Gautier and Nicolas Maillard.
Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs
Yoonho Park, Eric Van Hensbergen, Marius Hillenbrand, Todd Inglett, Bryan Rosenburg, Kyung Ryu and Robert Wisniewski.
FusedOS: Fusing LWK Performance with FWK Functionality in a Heterogeneous Environment
Akhil Langer, Jonathan Lifflander, Phil Miller, Kuo-Chuan Pan, Laxmikant Kale and Paul Ricker.
A Scalable Mesh Restructuring Algorithm for Distributed-Memory Adaptive Mesh Refinement
Vladimir Gajinov, Srdjan Stipic, Osman Unsal, Tim Harris, Eduard Ayguade and Adrian Cristal.
Integrating Dataflow Abstractions into the Shared Memory Model
Ghislain Landry Tsafack Chetsa, Lefevre Laurent, Jean-Marc Pierson, Patritia Stolf and Georges Da Costa.
Beyond CPU Frequency Scaling for a Fine-grained Energy Control of HPC Systems
Mohamed Ibrahim and Binoy Ravindran.
Transactional Forwarding: Supporting Highly-Concurrent STM in Asynchronous Distributed Systems
Rafael Auler, Paulo Centoducatte and Edson Borin.
ACCGen: An Automatic ArchC Compiler Generator
Ioannis Manousakis and Dimitrios Nikolopoulos.
BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies
Mark Richards, Abhishek Gupta, Osman Sarood and Laxmikant Kale.
Parallelizing Information Set Generation for Game Tree Search Applications
Alexandro Baldassin, João Carvalho, Leonardo Garcia and Rodolfo Azevedo.
Energy-Performance Tradeoffs in Software Transactional Memory
Ilie Tanase, George Almasi, Charles Archer and Hanhong Xue.
Network Endpoints for Clusters of SMPs
Nam Ma, Yinglong Xia and Viktor Prasanna.
Parallel Exact Inference on Multicore Using MapReduce
Esteban Meneses, Osman Sarood and Laxmikant Kale.
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems
Mauricio Breternitz, Keith Lowery, Anton Chernoff, Patryk Kaminski and Leonardo Piga.
Cloud Workload Analysis with SWAT
Jaime Cohen, Luiz A. Rodrigues and Elias P. Duarte Jr..
A Parallel Implementation of Gomory-Hu's Cut Tree Algorithm