1st Edition

Distributed System Design

By Jie Wu Copyright 1999
    496 Pages
    by CRC Press

    488 Pages
    by CRC Press

    Future requirements for computing speed, system reliability, and cost-effectiveness entail the development of alternative computers to replace the traditional von Neumann organization. As computing networks come into being, one of the latest dreams is now possible - distributed computing.
    Distributed computing brings transparent access to as much computer power and data as the user needs for accomplishing any given task - simultaneously achieving high performance and reliability.
    The subject of distributed computing is diverse, and many researchers are investigating various issues concerning the structure of hardware and the design of distributed software. Distributed System Design defines a distributed system as one that looks to its users like an ordinary system, but runs on a set of autonomous processing elements (PEs) where each PE has a separate physical memory space and the message transmission delay is not negligible. With close cooperation among these PEs, the system supports an arbitrary number of processes and dynamic extensions.
    Distributed System Design outlines the main motivations for building a distributed system, including:

  • inherently distributed applications
  • performance/cost
  • resource sharing
  • flexibility and extendibility
  • availability and fault tolerance
  • scalability
    Presenting basic concepts, problems, and possible solutions, this reference serves graduate students in distributed system design as well as computer professionals analyzing and designing distributed/open/parallel systems.
    Chapters discuss:
  • the scope of distributed computing systems
  • general distributed programming languages and a CSP-like distributed control description language (DCDL)
  • expressing parallelism, interprocess communication and synchronization, and fault-tolerant design
  • two approaches describing a distributed system: the time-space view and the interleaving view
  • mutual exclusion and related issues, including election, bidding, and self-stabilization
  • prevention and detection of deadlock
  • reliability, safety, and security as well as various methods of handling node, communication, Byzantine, and software faults
  • efficient interprocessor communication mechanisms as well as these mechanisms without specific constraints, such as adaptiveness, deadlock-freedom, and fault-tolerance
  • virtual channels and virtual networks
  • load distribution problems
  • synchronization of access to shared data while supporting a high degree of concurrency
  • Introduction
    Motivation
    Basic Computer Organizations
    Definition of a Distributed System
    Our Model
    Interconnection Networks
    Applications and Standards
    Scope
    Source of References
    Distributed Programming Languages
    Requirement for Distributed Programming Support
    An Overview of Parallel/Distributed Programming Language
    Expressing Parallelism
    Process Communication and Synchronization
    Remote Procedure Calls
    Robustness
    Formal Approaches to Distributed Systems Design
    Introduction to Models
    Causally Related Events
    Global States
    Logical Clocks
    Applications
    Classification of Distributed Control Algorithms
    The Complexity of Distributed Algorithms
    Mutual Exclusion and Election Algorithms
    Mutual Exclusion
    Non-Token-Based Solutions
    Token-Based Solutions
    Election
    Bidding
    Self-Stabilization
    Prevention, Avoidance, and Detection of Deadlock
    The Deadlock Problem
    Deadlock Prevention
    A Deadlock Prevention Example: Distributed Database Systems
    Deadlock Avoidance
    A Deadlock Prevention Example: Multi-Robot Flexible Assembly Cells
    Deadlock Detection and Recovery
    Deadlock Detection and Recovery Examples
    Distributed Routing Algorithms
    Introduction
    General-Purpose Shortest Path Routing
    Unicasting in Special-Purpose Networks
    Broadcasting in Special-Purpose Networks
    Multicasting in Special-Purpose Networks
    Adaptive Deadlock-Free, and Fault-Tolerant Routing
    Virtual Channels and Virtual Networks
    Fully Adaptive and Deadlock-Free Routing
    Partially Adaptive and Deadlock-Free Routing
    Fault-Tolerant Unicasting: General Approaches
    Fault-Tolerant Unicasting in 2-D Meshes and Tori
    Fault-Tolerant Unicasting in Hypercubes
    Fault-Tolerant Broadcasting
    Fault-Tolerant Multicasting
    Reliability in Distributed Systems
    Basic Models
    Building Blocks of Fault-Tolerant Design
    Handling of Node Faults
    Issues in Backward Recovery
    Handling of Byzantine Faults
    Handling of Communication Faults
    Handling of Storage Faults
    Static Load Distribution
    Classification of Load Distribution
    Static Load Distribution
    An Overview of Different Scheduling Models
    Task Scheduling Based on Task Precedent Graphs
    Case Study: Two Optimal Scheduling Algorithms
    Task Scheduling Based on Task Interaction Graphs
    Case Study: Domain Partition
    Scheduling Using Other Models and Objectives
    Future Directions
    Dynamic Load Distribution
    Dynamic Load Distribution
    Load Balancing Design Decisions
    Migration Policies: Sender-Initiated and Receiver-Initiated
    Parameters Used for Load Balancing
    Other Relevant Issues
    Sample Load Balancing Algorithms
    Case Study: Load Balancing on Hypercube Multicomputers
    Future Directions
    Distributed Data Management
    Basic Concepts
    Serializability Theory
    Concurrency Control
    Replica and Consistency Management
    Distributed Reliability Protocols
    Distributed System Applications
    Distributed Operating Systems
    Distributed File Systems
    Distributed Shared Memory
    Distributed Database Systems
    Heterogeneous Processing
    Future Directions of Distributed Systems

    Biography

    Jie Wu