1st Edition

Designing Scientific Applications on GPUs

Edited By Raphael Couturier Copyright 2014
    498 Pages 118 B/W Illustrations
    by Chapman & Hall

    Many of today’s complex scientific applications now require a vast amount of computational power. General purpose graphics processing units (GPGPUs) enable researchers in a variety of fields to benefit from the computational power of all the cores available inside graphics cards.

    Understand the Benefits of Using GPUs for Many Scientific Applications

    Designing Scientific Applications on GPUs shows you how to use GPUs for applications in diverse scientific fields, from physics and mathematics to computer science. The book explains the methods necessary for designing or porting your scientific application on GPUs. It will improve your knowledge about image processing, numerical applications, methodology to design efficient applications, optimization methods, and much more.

    Everything You Need to Design/Port Your Scientific Application on GPUs

    The first part of the book introduces the GPUs and Nvidia’s CUDA programming model, currently the most widespread environment for designing GPU applications. The second part focuses on significant image processing applications on GPUs. The third part presents general methodologies for software development on GPUs and the fourth part describes the use of GPUs for addressing several optimization problems. The fifth part covers many numerical applications, including obstacle problems, fluid simulation, and atomic physics models. The last part illustrates agent-based simulations, pseudorandom number generation, and the solution of large sparse linear systems for integer factorization. Some of the codes presented in the book are available online.

    PRESENTATION OF GPUs
    Presentation of the GPU Architecture and the Cuda Environment Raphaël Couturier
    Introduction
    Brief history of video card
    GPGPU
    Architecture of current GPUs
    Kinds of parallelism
    Cuda multithreading
    Memory hierarchy

    Introduction to Cuda Raphaël Couturier
    Introduction
    First example
    Second example: using CUBLAS
    Third example: matrix-matrix multiplication

    IMAGE PROCESSING
    Setting up the Environment
    Gilles Perrot
    Data transfers, memory management
    Performance measurements

    Implementing a Fast Median Filter Gilles Perrot
    Introduction
    Median filtering
    NVidia GPU tuning recipes
    A 3x3 median filter: using registers
    A 5x5 and more median filter

    Implementing an Efficient Convolution Operation on GPU Gilles Perrot
    Overview
    Definition
    Implementation
    Separable convolution

    SOFTWARE DEVELOPMENT
    Development of Software Components for Heterogeneous Many-Core Architectures Stefan L. Glimberg, Allan P. Engsig-Karup, Allan S. Nielsen, and Bernd Dammann
    Software development for heterogeneous
    Heterogeneous library design for PDE solvers
    Model problems
    Optimization strategies for multi-GPU systems

    Development Methodologies for GPU and Cluster of GPUs Sylvain Contassot-Vivier, Stephane Vialle, and Jens Gustedt
    Introduction
    General scheme of synchronous code with computation/communication overlapping in GPU clusters
    General scheme of asynchronous parallel code with computation/communication overlapping
    Perspective: A unifying programming model

    OPTIMIZATION
    GPU-Accelerated Tree-Based Exact Optimization Methods
    Imen Chakroun and Nouredine Melab
    Introduction
    Branch-and-bound (B&B) algorithm
    Parallel B&B algorithms
    The flowshop scheduling problem
    GPU-accelerated B&B based on the parallel tree exploration (GPU-PTE-BB)
    GPU-accelerated B&B based on the parallel evaluation of bounds (GPU-PEB-BB)
    Thread divergence
    Memory access optimization
    Experiments

    Parallel GPU-Accelerated Metaheuristics Malika Mehdi, Ahcène Bendjoudi, Lakhdar Loukil, and Nouredine Melab
    Introduction
    Combinatorial optimization
    Parallel models for metaheuristics
    Challenges for the design of GPU-based metaheuristics
    State-of-the-art parallel metaheuristics on GPUs
    Frameworks for metaheuristics on GPUs
    Case study: Accelerating large neighborhood LS method on GPUs for solving the Q3AP

    Linear Programming on a GPU: A Case Study Xavier Meyer, Bastien Chopard, and Paul Albuquerque
    Introduction
    Simplex algorithm
    B&B algorithm
    CUDA considerations
    Implementations
    Performance model
    Measurements and analysis

    NUMERICAL APPLICATIONS
    Fast Hydrodynamics on Heterogeneous Many-Core Hardware
    Allan P. Engsig-Karup, Stefan L. Glimberg, Allan S. Nielsen, and Ole Lindberg
    On hardware trends and challenges in scientific applications
    On modeling paradigms for highly nonlinear and dispersive water waves
    Governing equations
    The numerical model
    Properties of the numerical model
    Numerical experiments

    Parallel Monotone Spline Interpolation and Approximation on GPUs Gleb Beliakov and Shaowu Liu
    Introduction
    Monotone splines
    Smoothing noisy data via parallel isotone regression

    Solving Linear Systems with GMRES and CG Methods on GPU Clusters Lilia Ziane Khodja, Raphaël Couturier, and Jacques Bahi
    Introduction
    Krylov iterative methods
    Parallel implementation on a GPU cluster
    Experimental results

    Solving Sparse Nonlinear Systems of Obstacle Problems on GPU Clusters Lilia Ziane Khodja, Raphaël Couturier, Jacques Bahi, Ming Chau, and Pierre Spitéri
    Introduction
    Obstacle problems
    Parallel iterative method
    Parallel implementation on a GPU cluster
    Experimental tests on a GPU cluster
    Red-black ordering technique

    Ludwig: Multiple GPUs for a Fluid Lattice Boltzmann Application Alan Gray and Kevin Stratford
    Introduction
    Background
    Single GPU implementation
    Multiple GPU implementation
    Moving solid particles

    Numerical Validation and GPU Performance in Atomic Physics Rachid Habel, Pierre Fortin, Fabienne Jézéquel, Jean-Luc Lamotte, and Stan Scott
    Introduction
    2DRMP and the PROP program
    Numerical validation of PROP in single precision
    Toward a complete deployment of PROP on GPUs
    Performance results
    Propagation of multiple concurrent energies on GPU

    GPU-Accelerated Envelope-Following Method Xuexin Liu, Sheldon Xiang-Dong Tan, Hai Wang, and Hao Yu
    Introduction
    The envelope-following method in a nutshell
    New parallel envelope-following method
    Numerical examples

    OTHER
    Implementing Multi-Agent Systems on GPU
    Guillaume Laville, Christophe Lang, Bénédicte Herrmann, Laurent Philippe, Kamel Mazouzi, and Nicolas Marilleau
    Introduction
    Running agent-based simulations
    A first practical example
    Second example
    Analysis and recommendations

    Pseudorandom Number Generator on GPU Raphaël Couturier and Christophe Guyeux
    Introduction
    Basic reminders
    Toward efficiency and improvement for CI PRNG
    Experiments

    Solving Large Sparse Linear Systems for Integer Factorization on GPUs Bertil Schmidt and Hoang-Vu Dang
    Introduction
    Block Wiedemann algorithm
    SpMV OVER GF(2) for NFS matrices using existing formats on GPUs
    A hybrid format for SpMV on GPUs
    SCOO for single-precision floating-point matrices
    Performance evaluation

    Index

    A Bibliography appears at the end of each chapter.

    Biography

    Raphaël Couturier is a professor of computer science at the University of Franche-Comte and vice head of the Computer Science Department at FEMTO-ST Institute. He has co-authored over 80 articles in peer-reviewed journals and conferences. He received a Ph.D. from Henri Poincaré University. His research interests include parallel and distributed computation, numerical algorithms, GPU and FPGA computing, and asynchronous iterative algorithms.

    "This book covers not only the knowledge of GPU and CUDA programming, but also provides successful real applications in many domains, including signal processing, image processing, physics, and artificial intelligence. The most recent research outcome and the most recent progress of GPU architectures are included, such as multi-GPU programming and GPU clusters. I believe it is a very good reference for GPU and CUDA parallel programming courses as it provides detailed illustration of the architectures of GPU, programming principles of CUDA, CUDA libraries for algebra, and a series of real applications. In addition, it will definitely contribute to the progress of research in CUDA-enabled parallel computing."
    —Professor Ying Liu, School of Computer and Control, University of Chinese Academy of Sciences