1st Edition

Introduction to High Performance Computing for Scientists and Engineers

By Georg Hager, Gerhard Wellein Copyright 2011
    356 Pages 143 B/W Illustrations
    by CRC Press

    360 Pages
    by CRC Press

    Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the authors gained a unique perspective on the requirements and attitudes of users as well as manufacturers of parallel computers.

    The text first introduces the architecture of modern cache-based microprocessors and discusses their inherent performance limitations, before describing general optimization strategies for serial code on cache-based architectures. It next covers shared- and distributed-memory parallel computer architectures and the most relevant network topologies. After discussing parallel computing on a theoretical level, the authors show how to avoid or ameliorate typical performance problems connected with OpenMP. They then present cache-coherent nonuniform memory access (ccNUMA) optimization techniques, examine distributed-memory parallel programming with message passing interface (MPI), and explain how to write efficient MPI code. The final chapter focuses on hybrid programming with MPI and OpenMP.

    Users of high performance computers often have no idea what factors limit time to solution and whether it makes sense to think about optimization at all. This book facilitates an intuitive understanding of performance limitations without relying on heavy computer science knowledge. It also prepares readers for studying more advanced literature.

    Read about the authors’ recent honor: Informatics Europe Curriculum Best Practices Award for Parallelism and Concurrency

    Modern Processors
    Stored-program computer architecture
    General-purpose cache-based microprocessor architecture
    Memory hierarchies
    Multicore processors
    Multithreaded processors
    Vector processors

    Basic Optimization Techniques for Serial Code
    Scalar profiling
    Common sense optimizations
    Simple measures, large impact
    The role of compilers
    C++ optimizations

    Data Access Optimization
    Balance analysis and lightspeed estimates
    Storage order
    Case study: The Jacobi algorithm
    Case study: Dense matrix transpose
    Algorithm classification and access optimizations
    Case study: Sparse matrix-vector multiply

    Parallel Computers
    Taxonomy of parallel computing paradigms
    Shared-memory computers
    Distributed-memory computers
    Hierarchical (hybrid) systems
    Networks

    Basics of Parallelization
    Why parallelize?
    Parallelism
    Parallel scalability

    Shared-Memory Parallel Programming with OpenMP
    Short introduction to OpenMP
    Case study: OpenMP-parallel Jacobi algorithm
    Advanced OpenMP: Wavefront parallelization

    Efficient OpenMP Programming
    Profiling OpenMP programs
    Performance pitfalls
    Case study: Parallel sparse matrix-vector multiply

    Locality Optimizations on ccNUMA Architectures
    Locality of access on ccNUMA
    Case study: ccNUMA optimization of sparse MVM
    Placement pitfalls
    ccNUMA issues with C++

    Distributed-Memory Parallel Programming with MPI
    Message passing
    A short introduction to MPI
    Example: MPI parallelization of a Jacobi solver

    Efficient MPI Programming
    MPI performance tools
    Communication parameters
    Synchronization, serialization, contention
    Reducing communication overhead
    Understanding intranode point-to-point communication

    Hybrid Parallelization with MPI and OpenMP
    Basic MPI/OpenMP programming models
    MPI taxonomy of thread interoperability
    Hybrid decomposition and mapping
    Potential benefits and drawbacks of hybrid programming

    Appendix A: Topology and Affinity in Multicore Environments
    Appendix B: Solutions to the Problems

    Bibliography

    Index

    Biography

    Georg Hager is a senior research scientist in the high performance computing group of the Erlangen Regional Computing Center at the University of Erlangen-Nuremberg in Germany. Gerhard Wellein leads the high performance computing group of the Erlangen Regional Computing Center and is a professor in the Department for Computer Science at the University of Erlangen-Nuremberg in Germany.

    Georg Hager and Gerhard Wellein have developed a very approachable introduction to high performance computing for scientists and engineers. Their style and description is easy to read and follow. … This book presents a balanced treatment of the theory, technology, architecture, and software for modern high performance computers and the use of high performance computing systems. The focus on scientific and engineering problems makes this both educational and unique. I highly recommend this timely book for scientists and engineers. I believe this book will benefit many readers and provide a fine reference.
    —From the Foreword by Jack Dongarra, University of Tennessee, Knoxville, USA