Performance Tuning of Scientific Applications

Free Standard Shipping

Purchasing Options

ISBN 9781439815694
Cat# K10806



SAVE 20%

Other eBook Options:


  • Provides an overview of modern computer architecture
  • Presents tools and techniques for monitoring floating-point operation counts, integer operations, cache misses, and more
  • Explains how to encapsulate the performance behavior of applications and computer system into relatively simple yet accurate models
  • Covers benchmark performance and analysis
  • Illustrates how to optimize the run-time performance of a scientific application or class of applications by using semi-automatic techniques and tools
  • Includes examples from such areas as solid mechanics, astrophysics, quantum chromodynamics, molecular dynamics, and environmental science


With contributions from some of the most notable experts in the field, Performance Tuning of Scientific Applications presents current research in performance analysis. The book focuses on the following areas.

Performance monitoring: Describes the state of the art in hardware and software tools that are commonly used for monitoring and measuring performance and managing large quantities of data

Performance analysis: Discusses modern approaches to computer performance benchmarking and presents results that offer valuable insight into these studies

Performance modeling: Explains how researchers deduce accurate performance models from raw performance data or from other high-level characteristics of a scientific computation

Automatic performance tuning: Explores ongoing research into automatic and semi-automatic techniques for optimizing computer programs to achieve superior performance on any computer platform

Application tuning: Provides examples that show how the appropriate analysis of performance and some deft changes have resulted in extremely high performance

Performance analysis has grown into a full-fledged, sophisticated field of empirical science. Describing useful research in modern performance science and engineering, this book helps real-world users of parallel computer systems to better understand both the performance vagaries arising in scientific applications and the practical means for improving performance.

Read about the book on HPCwire and insideHPC

Table of Contents

Introduction, David H. Bailey
"Twelve Ways to Fool the Masses"
Examples from Other Scientific Fields
Guidelines for Reporting High Performance
Modern Performance Science

Parallel Computer Architecture, Samuel W. Williams and David H. Bailey
Parallel Architectures
Processor (Core) Architecture
Memory Architecture
Network Architecture
Heterogeneous Architectures

Software Interfaces to Hardware Counters, Shirley V. Moore, Daniel K. Terpstra, and Vincent M. Weaver
Processor Counters
Off-Core and Shared Counter Resources
Platform Examples
Operating System Interfaces
PAPI in Detail
Counter Usage Modes
Uses of Hardware Counters
Caveats of Hardware Counters

Measurement and Analysis of Parallel Program Performance using TAU and HPCToolkit, Allen D. Malony, John Mellor-Crummey, and Sameer S. Shende
Measurement Approaches
HPCToolkit Performance Tools
TAU Performance System

Trace-Based Tools, Jesus Labarta
Tracing and Its Motivation
Data Acquisition
Techniques to Identify Structure
The Future

Large-Scale Numerical Simulations on High-End Computational Platforms, Leonid Oliker, Jonathan Carter, Vincent Beckner, John Bell, Harvey Wasserman, Mark Adams, Stéphane Ethier, and Erik Schnetter
HPC Platforms and Evaluated Applications
GTC: Turbulent Transport in Magnetic Fusion
GTC Performance
OLYMPUS: Unstructured FEM in Solid Mechanics
Carpet: Higher-Order AMR in Relativistic Astrophysics
CASTRO: Compressible Astrophysics
MILC: Quantum Chromodynamics

Performance Modeling: The Convolution Approach, David H Bailey, Allan Snavely, and Laura Carrington
Applications of Performance Modeling
Basic Methodology
Performance Sensitivity Studies

Analytic Modeling for Memory Access Patterns Based on Apex-MAP, Erich Strohmaier, Hongzhang Shan, and Khaled Ibrahim
Memory Access Characterization
Apex-MAP Model to Characterize Memory Access Patterns
Using Apex-MAP to Assess Processor Performance
Apex-MAP Extension for Parallel Architectures
Apex-MAP as an Application Proxy
Limitations of Memory Access Modeling

The Roofline Model, Samuel W. Williams
The Roofline
Bandwidth Ceilings
In-Core Ceilings
Arithmetic Intensity Walls
Alternate Roofline Models

End-to-End Auto-Tuning with Active Harmony, Jeffrey K. Hollingsworth and Ananta Tiwari
Sources of Tunable Data
Auto-Tuning Experience with Active Harmony

Languages and Compilers for Auto-Tuning, Mary Hall and Jacqueline Chame
Language and Compiler Technology
Interaction between Programmers and Compiler
Code Transformation
Higher-Level Capabilities

Empirical Performance Tuning of Dense Linear Algebra Software, Jack Dongarra and Shirley Moore
Background and Motivation
Auto-Tuning for Multicore
Auto-Tuning for GPUs

Auto-Tuning Memory-Intensive Kernels for Multicore, Samuel W. Williams, Kaushik Datta, Leonid Oliker, Jonathan Carter, John Shalf, and Katherine Yelick
Experimental Setup
Computational Kernels
Optimizing Performance
Automatic Performance Tuning

Flexible Tools Supporting a Scalable First-Principles MD Code, Bronis R. de Supinski, Martin Schulz, and Erik W. Draeger
Qbox: A Scalable Approach to First-Principles Molecular Dynamics
Experimental Setup and Baselines
Optimizing Qbox: Step by Step
Customizing Tool Chains with PN MPI

The Community Climate System Model, Patrick H. Worley
CCSM Overview
Parallel Computing and the CCSM
Case Study: Optimizing Interprocess Communication Performance in the Spectral Transform Method
Performance Portability: Supporting Options and Delaying Decisions
Case Study: Engineering Performance Portability into the Community Atmosphere Model Case Study: Porting the Parallel Ocean Program to the Cray X1
Monitoring Performance Evolution
Performance at Scale

Tuning an Electronic Structure Code, David H. Bailey, Lin-Wang Wang, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, and Byounghak Lee
LS3DF Algorithm Description
LS3DF Code Optimizations
Test Systems
Performance Results and Analysis
Science Results



Editor Bio(s)

David Bailey is a chief technologist in the High Performance Computational Research Department at the Lawrence Berkeley National Laboratory. Dr. Bailey has published several books and numerous research studies on computational and experimental mathematics. He has been a recipient of the ACM Gordon Bell Prize, the IEEE Sidney Fernbach Award, and the MAA Chauvenet Prize and Merten Hasse Prize.

Robert Lucas is the director of computational sciences in the Information Sciences Institute and a research associate professor in computer science in the Viterbi School of Engineering at the University of Southern California. Dr. Lucas has many years of experience working with high-end defense, national intelligence, and energy applications and simulations. His linear solvers are the computational kernels of electrical and mechanical CAD tools.

Samuel Williams is a researcher in the Future Technologies Group at the Lawrence Berkeley National Laboratory. Dr. Williams has authored or co-authored thirty technical papers, including several award-winning papers. His research interests include high-performance computing, auto-tuning, computer architecture, performance modeling, and VLSI.