Building and Maintaining a Data Warehouse

Published:
Author(s):

Purchasing Options

Hardback
$89.95
Add to cart
ISBN 9781420064629
Cat# AU6462
 

Features

Eleven Principles to Guide the Building of Data Warehouses

  • One Thing at a Time
  • Know When to Begin
  • Know When to End
  • Large to Medium to Small
  • Stage Data Integrity
  • Know What You Have
  • Name the Data
  • Own the Data
  • Build the Data
  • Type the Data
  • Land the Data
  • Summary

    As it is with building a house, most of the work necessary to build a data warehouse is neither visible nor obvious when looking at the completed product. While it may be easy to plan for a data warehouse that incorporates all the right concepts, taking the steps needed to create a warehouse that is as functional and user-friendly as it is theoretically sound, is not especially easy. That’s the challenge that Building and Maintaininga Data Warehouse answers.

    Based on a foundation of industry-accepted principles, this work provides an easy-to-follow approach that is cohesive and holistic. By offering the perspective of a successful data warehouse, as well as that of a failed one, this workdetails those factors that must be accomplished and those that are best avoided.

    Organized to logically progress from more general to specific information, this valuable guide:

    • Presents areas of a data warehouse individually and in sequence, showing how each piece becomes a working part of the whole
    • Examines the concepts and principles that are at the foundation of every successful data warehouse
    • Explains how to recognize and attend to problematic gaps in an established data warehouse
    • Provides the big picture perspective that planners and executives require

    Those considering the planning and creation of a data warehouse, as well as those who’ve already built one will profit greatly from the insights garnered by the author during his years of creating and gathering information on state-of-the-art data warehouses that are accessible, convenient, and reliable.

    Table of Contents

    The Big Picture: An Introduction to Data Warehousing
    Decision Support Systems; Dimensional and Third Normal Form Data Models; Storing the Data; Data Availability; Monitoring Data Quality.
    Data Warehouse Philosophy
    Enterprise Data; Subject Orientation; Data Integration; Form; Function; Grain; Nonvolatility; Time Variant; One Version of the Truth; Long-Term Investment.
    Source System Analysis
    Source System Analysis Principles; System of Record; Entity Data Arithmetic Data: Absolute, Relative, Numeric Data That Isn’t Arithmetic; Alphanumeric Data; Granularity; Latency; Transaction Data; Snapshot Data; Source System Analysis Methods; Data Profile; Data Flow Diagram; Data State Diagram; System of Record; Business Rules.
    Relational Database Management System (RDBMS)
    Relational Set Theory; RDBMS Product Offerings; Residual Costs; Licensing; Support and Maintenance; Extensibility; Connective Capacity.
    Database Design
    Data Modeling Methodology; Conceptual and Logical Data Models; Logical (Primary) Key; Attribute; Primary Key/Foreign Key Relation; Cardinality; Super Types and Subtypes; Physical Data Model; Dimensional Data Model; Third Normal Form Data Model; Recursive Data Model; Physical Data Model Summary; Data Architecture; Enterprise Data Warehouse; Data Mart; Operational Data Store; Summaries and Aggregates.
    Data Acquisition and Integration
    Source System and Target System Analysis; Direct and Indirect Requirements; Language; Data Profile; Data State; Data Mapping; Business Rules; Architecture; Extract, Transform, and Load (ETL); Extract, Load, and Transform (ELT); ETL Design and Process Principles.
    Eleven Principles; Staging Principles Conclusion; ETL Functions; Extract Data from a Contiguous Dataset and from a Data Flow; Row-Level and Dataset-Level Transformation.
    Surrogate Key Generation: Intradataset, Data Warehouse-Level Transformation, Intra-Data Warehouse, Changed Data Capture, ETL Key, Universe to Universe and Candidate to Universe, Load Data from a Stable and Contiguous Dataset, Load Data from a Data Flow.
    Transaction Summary; Dimension Aggregation.
    Common Problems: Source Data Anomalies, Incomplete, Redundant, and Misstated Source Data; Business Rule Changes, Obsolete, Redefined, and Unrecorded Data.
    Business Intelligence (BI) Reporting
    Success Factors; Performance; User Interface; Presentation of the Data Architecture; Alignment with the Data Model; Ability to Answer Questions; Mobility; Flexibility; Availability; Customer Success Factors.
    Processes: Proactive, Reactive, Predefined, Ad Hoc; Data, Information, and Analytic Needs; BI Reporting Application and Architecture.
    BI Reporting Methods: Predefined and Interactive Reports, Online Analytical Process (OLAP) Reports, MOLAP, ROLAP, HOLAP; Drilling; Push versus Pull; Printed on Paper; Report Archives; Web-Based BI Reporting; Operational BI Reporting: From an ODS, From an Operational System (Real-Time), EDI, Partnerships, and Data Sharing.
    BI Reporting: Customer Relationship Management (CRM), Business Metrics Measure the Enterprise, Decisions and Decision Making Closer to the Action; Reporting around the Event; BI Search; Sarbanes–Oxley and BI Reporting; Data Mining; Statistics Concepts; Random Error; Statistical Significance.
    Variables: Dependent and Independent.
    Hypothesis; Data Mining Tools and Activities; Data Cleansing; Data Inspection; Compound, Lag, Numeric, and Categorical Variables; Hypothesis; Data Mining Algorithms; Neural Network; Decision Tree; CHAID; Nearest Neighbor; Rule Induction; Genetic Algorithm; Rule Validation and Testing; Overfitting.
    Data Quality
    Deming’s Definition of Quality; Data Quality Service Level Agreement (SLA); Deming’s Statistical Process Control; Process Measurement; Methods and Strategies; Data Stewardship; Post-Load Audit and Report Errant Data.
    Plug in a Default Value and Report Errant Data; Reject a Record and Report the Errant Record; Reject a Dataset and Report the Errant Dataset.
    Recycle the Data: In Place and Report Errant Data, Recycle Wheel and Report Errant Data, Data Quality Repository; Data Quality Fact Table: Dimensional Data Model, Third Normal Form Data Model; Data Quality Reporting.
    Metadata
    Types of Metadata; Static and Dynamic Metadata; Metadata Service Level Agreement (SLA); Metadata Repository; Central Metadata Repository: Dimensional Data Model; Third Normal Form; Distributed Metadata Repository; Real-Time Metadata; Data Quality as Metadata; Make or Buy a Metadata Repository.
    Data Warehouse Customers
    Strategic Decision Makers; Tactical Decision Makers; Knowledge Workers; Operational Applications; External Partners; Electronic Data Interchange (EDI) Partners; Data Warehouse Plan.
    Future of Data Warehousing: An Epilogue
    Scalability and Performance; Real-Time Data Warehousing; Increased Corporate Presence; Back to the Basics; Data Quality.
    Short TOC
    The Big Picture: An Introduction to Data Warehousing
    Data Warehouse Philosophy
    Source System Analysis
    Relational Database Management System (RDBMS)
    Database Design
    Data Acquisition and Integration
    Business Intelligence Reporting
    Data Quality
    Metadata
    Data Warehouse Customers
    Future of Data Warehousing: An Epilogue
    Bibliography
    Index
    Toc to post to abstract
    Preface
    Acknowledgments
    The Author
    Introduction
    The Big Picture: An Introduction to Data Warehousing
    Introduction
    Decision Support Systems
    Dimensional and Third Normal Form Data Models
    Storing the Data
    Data Availability
    Monitoring Data Quality
    Data Warehouse Philosophy
    Introduction
    Enterprise Data
    Subject Orientation
    Data Integration
    Form
    Function
    Grain
    Nonvolatility
    Time Variant
    One Version of the Truth
    Long-Term Investment
    References
    Source System Analysis
    Introduction
    Source System Analysis Principles
    System of Record
    Entity Data
    Arithmetic Data
    Absolute Arithmetic Data
    Relative Arithmetic Data
    Numeric Data That Isn’t Arithmetic
    Alphanumeric Data
    Granularity
    Latency
    Transaction Data
    Snapshot Data
    Source System Analysis Methods
    Data Profile
    Data Flow Diagram
    Data State Diagram
    System of Record
    Business Rules
    Closing Remarks
    References
    Relational Database Management System (RDBMS)
    Introduction
    Relational Set Theory
    RDBMS Product Offerings
    Residual Costs
    Licensing
    Support and Maintenance
    Extensibility
    Connective Capacity
    Closing Remarks
    References
    Database Design
    Introduction
    Data Modeling Methodology
    Conceptual Data Model
    Logical Data Model
    Logical (Primary) Key
    Attribute
    Primary Key/Foreign Key Relation
    Cardinality
    Super Types and Subtypes
    Putting It All Together
    Physical Data Model
    Dimensional Data Model
    Third Normal Form Data Model
    Recursive Data Model
    Physical Data Model Summary
    Data Architecture
    Enterprise Data Warehouse
    Data Mart
    Operational Data Store
    Summaries and Aggregates
    Closing Remarks
    References
    Data Acquisition and Integration
    Introduction
    Source System Analysis
    Target System Analysis
    Direct Requirements
    Indirect Requirements
    Direct and Indirect Requirements
    Language
    Data Profile
    Data State
    Data Mapping
    Business Rules
    Architecture
    Extract, Transform, and Load (ETL)
    Extract, Load, and Transform (ELT)
    ETL Design Principles
    ETL Process Principles
    Principle 01: One Thing at a Time
    Principle 02: Know When to Begin
    Principle 03: Know When to End
    Principle 04: Large to Medium to Small
    Principle 05: Stage Data Integrity
    Principle 06: Know What You Have
    Process Principles Conclusion
    ETL Staging Principles
    Principle 07: Name the Data
    Principle 08: Own the Data
    Principle 09: Build the Data
    Principle 10: Type the Data
    Principle 11: Land the Data
    Staging Principles Conclusion
    ETL Functions
    Extract Data from a Contiguous Dataset
    Extract Data from a Data Flow
    Row-Level Transformation
    Dataset-Level Transformation
    Surrogate Key Generation: Intradataset
    Data Warehouse-Level Transformation
    Surrogate Key Generation: Intra-Data Warehouse
    Look-Up
    Changed Data Capture
    ETL Key
    Universe to Universe and Candidate to Universe
    Load Data from a Stable and Contiguous Dataset
    Load Data from a Data Flow
    Transaction Summary
    Dimension Aggregation
    Common Problems
    Source Data Anomalies
    Incomplete Source Data
    Redundant Source Data
    Misstated Source Data
    Business Rule Changes
    Obsolete Data
    Redefined Data
    Unrecorded Data
    Closing Remarks
    References
    Business Intelligence Reporting
    Introduction
    BI Reporting Success Factors
    Performance
    User Interface
    Presentation of the Data Architecture
    Alignment with the Data Model
    Ability to Answer Questions
    Mobility
    Flexibility
    Availability
    BI Customer Success Factors
    Proactive Processes
    Reactive Processes
    Predefined Processes
    Ad Hoc Processes
    Data Needs.
    Information Needs
    Analytic Needs
    BI Reporting Application
    Architecture
    BI Reporting Methods.
    Predefined Reports
    Interactive Reports
    Online Analytical Process (OLAP) Reports
    MOLAP
    ROLAP
    HOLAP
    Drilling
    Push versus Pull
    Push
    Pull
    Printed on Paper
    Report Archives
    Web-Based BI Reporting
    Operational BI Reporting: From an ODS
    Operational BI Reporting: From an Operational System (Real-Time)
    Operational BI Reporting: EDI, Partnerships, and Data Sharing.
    BI Reporting: Thus Far.
    Customer Relationship Management (CRM)
    Business Metrics Measure the Enterprise
    Decisions and Decision Making Closer to the Action
    BI Reporting: Coming Soon
    Reporting around the Event
    BI Search
    Sarbanes–Oxley and BI Reporting
    Data Mining
    Statistics Concepts
    Random Error
    Statistical Significance
    Variables: Dependent and Independent
    Hypothesis
    Data Mining Tools
    Data Mining Activities
    Data Cleansing
    Data Inspection
    Compound Variables
    Lag Variables
    Numeric Variables
    Categorical Variables
    Hypothesis
    Data Mining Algorithms
    Neural Network
    Decision Tree
    CHAID
    Nearest Neighbor
    Rule Induction
    Genetic Algorithm
    Rule Validation and Testing
    Overfitting
    Closing Remarks
    References
    Data Quality
    Introduction
    Deming’s Definition of Quality
    Data Quality Service Level Agreement (SLA)
    Deming’s Statistical Process Control
    Process Measurement
    Methods and Strategies
    Data Stewardship
    Post-Load Audit and Report Errant Data
    Plug in a Default Value and Report Errant Data
    Reject a Record and Report the Errant Record
    Reject a Dataset and Report the Errant Dataset
    Recycle the Data: In Place and Report Errant Data
    Recycle the Data: Recycle Wheel and Report Errant Data
    Data Quality Repository
    Data Quality Fact Table: Dimensional Data Model
    Data Quality Fact Table: Third Normal Form Data Model
    Data Quality Reporting
    Follow Through
    Closing Remarks
    References
    Metadata
    Introduction
    Types of Metadata
    Static Metadata
    Dynamic Metadata
    Metadata Service Level Agreement (SLA)
    Metadata Repository
    Central Metadata Repository: Dimensional Data Model
    Central Metadata Repository: Third Normal Form
    Distributed Metadata Repository
    Real-Time Metadata
    Data Quality as Metadata
    Make or Buy a Metadata Repository
    Closing Remarks
    References
    Data Warehouse Customers
    Introduction
    Strategic Decision Makers
    Tactical Decision Makers
    Knowledge Workers
    Operational Applications
    External Partners
    Electronic Data Interchange (EDI) Partners
    Data Warehouse Plan
    Strategic Decision Makers
    Tactical Decision Makers
    Knowledge Workers
    Operational Applications
    External Partners
    Electronic Data Interchange (EDI) Partners
    Closing Remarks
    Future of Data Warehousing: An Epilogue
    Introduction
    Scalability and Performance
    Real-Time Data Warehousing
    Increased Corporate Presence
    Back to the Basics
    Data Quality
    Bibliography
    Index