1st Edition

Speculative Execution in High Performance Computer Architectures

Edited By David Kaeli, Pen-Chung Yew Copyright 2005
    456 Pages 94 B/W Illustrations
    by Chapman & Hall

    Until now, there were few textbooks that focused on the dynamic subject of speculative execution, a topic that is crucial to the development of high performance computer architectures. Speculative Execution in High Performance Computer Architectures describes many recent advances in speculative execution techniques. It covers cutting-edge research projects, as well as numerous commercial implementations that demonstrate the value of this latency-hiding technique.

    The book begins with a review of control speculation techniques that use instruction cache prefetching, branch prediction and predication, and multi-path execution. It then examines dataflow speculation techniques including data cache prefetching, address value and data value speculation, pre-computation, and coherence speculation. This textbook also explores multithreaded approaches, emphasizing profile-guided speculation, speculative microarchitectures, and compiler techniques.

    INTRODUCTION
    David R. Kaeli, Northeastern University and Pen C. Yew, University of Minnesota

    INSTRUCTION CACHE PREFETCHING
    Glenn Reinman, UCLA Computer Science Department
    Direct Mapped Cache
    Set Associative Cache
    Pseudo Associative Cache
    Way Prediction Cache
    Next Line Prefetching
    Target Prefetching
    Stream Buffers
    Nonblocking Instruction Caches and Out-Of-Order Fetch
    Fetch Directed Instruction Prefetching
    Integrated Prefetching
    Wrong-Path Prefetching
    Compiler Strategies

    BRANCH PREDICTION
    Philip G. Emma, IBM T.J. Watson Research Laboratory
    The von Neumann Programming Model vs. ENIAC
    Dataflow and Control Flow
    The Branch Instruction
    The IAS Machine: A Primitive Stored-Program Architecture
    Virtuality
    Branch Instruction Semantics
    General Instruction-Set Architectures and Extensions
    Memory Consistency and Observable Order
    Branches and Performance
    Pipelining
    Pipeline Disruptions and Their Penalties
    Superscalar Processing
    Multithreading
    Instruction Prefetching and Autonomy
    The Delayed Branch Instruction
    Branch Flow in a Pipeline: The "When" of Branch Prediction
    Predicting Branches at Decode Time
    Predicting Branches at Instruction-Prefetch Time
    Static Branch Prediction
    Dynamic Branch Prediction
    Branch Prediction With Counters
    Predicting by Profiling Branch Actions
    Group Behaviors vs. Predicting Individual Branches
    The Decode History Table (a.k.a. Branch History Table)
    Discriminators
    Using Multiple Discriminators: A Path-Based Approach
    Implementation
    A Timing Caveat
    Hybrid Predictors
    Instruction Prefetching
    The Branch History Table (a.k.a. Branch Target Buffer)
    Operation of the BTB
    Fetch Width and Branch Mispredictions
    The Subroutine Call and Return Structure
    Predicting Return Addresses by Using a Stack
    Recognizing Subroutine Calls and Returns
    Taking Advantage of the BTB Structure
    Eliminating the Stack
    Working Sets and Contexts
    The Size of a BTB Entry
    The BTB and the Instruction Cache: Economies of Size
    More Exotic Prediction for the More Difficult Branches
    Branches and the Operand Space
    Branches and the Operand-Address Space
    Tandem Branch Prediction
    Accuracy and the Updating of Tables
    Predictor Bandwidth and Anomalous Behaviors
    The Importance of Fast Prediction Mechanisms
    Superscalar Processing and the Monolithic Prediction
    of Branch Sequences
    Predicting Branches in a Multithreaded Environment
    Limitations
    Simplicity
    Complexity
    Two Saving Graces
    Implementing Real Branch Prediction Mechanisms

    TRACE CACHES
    Eric Rotenberg North Carolina State University
    Traces
    Core Fetch Unit Based on Instruction Cache
    Trace Cache Operation
    Path Associativity
    Indexing Strategy
    Partial Matching
    Coupling Branch Prediction with the Trace Cache
    Trace Selection Policy
    Multi-Phase Trace Construction
    Managing Overlap between Instruction Cache and Trace
    Cache
    Speculative vs. Non-Speculative Trace Cache Updates
    Powerful vs. Weak Core Fetch Unit
    Parallel vs. Serial Instruction Cache Access
    L1 vs. L2 Instruction Cache
    Loop Caches

    BRANCH PREDICATION
    David August, Princeton University
    Overcoming Branch Problems with Predication
    If-Conversion
    Predicate Optimization and Analysis
    The Predicated Intermediate Representation
    Hewlett-Packard Laboratories PD
    Cydrome Cydra 5
    ARM
    Texas Instruments C6X
    Systems with Limited Predicated Execution Support
    Predication in the Itanium 2 Processor

    MULTIPATH EXECUTION
    Augustus K. Uht, University of Rhode Island
    Branch Tree Geometry
    Branch Path/Instruction ID
    Phases of Operation
    Granularity
    With Predication
    With Data Speculation
    Compiler-Assisted
    Hardware: Classically-Based
    Hardware: Non Classically-Based
    Multiprocessors
    Functional or Logic Language Machines
    Branch Prediction
    Confidence Estimation
    Pipeline Depth
    Implications of Amdahl's Law - ILP Version
    Memory Bandwidth Requirements

    DATA CACHE PREFETCHING
    Yan Solihin, North Carolina State University, and Donald Yeung, University of Maryland at College Park
    Architectural Support
    Array Prefetching
    Pointer Prefetching
    Relationship with Data Locality Optimizations
    Stride and Sequential Prefetching
    Correlation Prefetching
    Content-Based Prefetching

    ADDRESS PREDICTION
    Avi Mendelson, Intel Mobil Micro-Processor Architect
    Terminology and Definitions
    Non-Speculative Address Calculation Techniques
    Speculative Address Calculation Techniques
    Chapter Focus
    Characterization of Address Predictability
    Address Predictability vs. Value Predictability
    Combining Address Prediction with Prefetching Mechanism
    Basic Characterization
    Load Promotion
    Memory Bypassing
    Compiler Based Speculative Load Promotion

    DATA SPECULATION
    Yiannakis Sazeides, University of Cyprus; Pedro Marcuello, Intel-UPC Barcelona Research Center; James E. Smith, University
    of Wisconsin-Madison; and Antonio González, Universitat Polit`ecnica de Catalunya
    Basic Value Predictors
    Value Predictor Alternatives
    Confidence Estimation
    Implementation Issues
    Data Dependence Predictors
    Verification
    Recovery
    Other Microarchitectural Implications of Data Value
    Speculation
    Related Work: Data Value Speculation
    Related Work: Data Dependence Speculation

    INSTRUCTION PRECOMPUTATION: DYNAMICALLY REMOVING REDUNDANT COMPUTATIONS USING PROFILING
    Joshua J. Yi, Freescale Semiconductor Inc.; Resit Sendag, University of Rhode Island; and David J. Lilja, University of Minnesota at Twin Cities
    A Comparison of Instruction Precomputation and Value Reuse
    Upper-Bound - Profile A, Run A
    Different Input Sets - Profile B, Run A
    Combination of Input Sets - Profile AB, Run A
    Frequency versus Frequency and Latency Product
    Performance of Instruction Precomputation versus Value Reuse

    PROFILE-BASED SPECULATION
    Youfeng Wu and Jesse Fang, Intel Microprocessor Technology Labs
    Control Flow Profile
    Memory Profile
    Value Profile
    Static Analysis
    Instrumentation
    Hardware Performance Monitoring
    Special Hardware
    Software-Hardware Collaborative Profiling
    Compile-time Profiling
    Runtime Profiling
    Continuous Profiling
    Trace Scheduling
    Hot-Cold Optimizations
    Code Layout
    Data Layout
    Stride Prefetching
    Hot Data Stream Prefetching
    Mississippi Delta Prefetching
    Java Runtime Parallelizing Machine
    Speculative Parallel Threading
    Speculative Computation Reuse
    Software-Based Speculative Precomputation
    Stability across Multiple Workloads
    Update When Program Changes
    Maintenance during optimizations
    Perturbation by Profiling Code

    COMPILATION AND SPECULATION
    Jin Lin, Wei-Chung Hsu and Pen-Chung Yew University of Minnesota, Minneapolis
    Alias Profiling
    Data Dependence Profiling
    Speculative Alias and Dataflow Analyses
    A Framework for Speculative Alias Analysis and Dataflow Analysis
    Overview
    F-Insertion Step
    Rename Step
    Downsafety Step
    CodeMotion Step
    Recovery Code Generation for General Speculative Optimizations
    Check Instructions and Recovery Code Representation for Multi-Level
    Speculation
    Interaction of the Early Introduced Recovery Code with Later
    Optimizations

    MULTITHREADING AND SPECULATION
    Pedro Marcuello, Jesus Sanchez and Antonio Gonzalez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalunya; Barcelona (Spain)
    Building Helper Threads
    Microarchitectural Support for Helper Threads
    Thread Spawning Schemes
    Microarchitectural Support for Speculative Architectural Threads
    References
    Andreas Moshovos, University of Toronto

    EXPLOITING LOAD/STORE PARALLELISM VIA MEMORY DEPENDENCE PREDICTION
    Static Methods
    Hybrid Static/Dynamic Methods
    Dynamic Methods
    Working Example
    Multiple Dependences Per Static Load or Store
    Methodology
    Performance Potential of Load/Store Parallelism
    Performance with Naive Memory Dependence Speculation
    Using Address-Based Scheduling to Extract Load/Store Parallelism
    Speculation/Synchronization

    RESOURCE FLOW MICROARCHITECTURES
    David A. Morano and David R. Kaeli, Northeastern University
    The Operand as a First Class Entity
    Dynamic Dependency Ordering
    Handling Multipath Execution
    Names and Renaming
    The Active Station Idea
    Register and Memory Operand Storage
    Operand Forwarding and Snooping
    Result Forwarding Buses and Operand Filtering
    A Small Resource-Flow Microarchitecture
    A Distributed Scalable Resource-Flow Microarchitecture

    Biography

    David Kaeli, Pen-Chung Yew