1st Edition

Open Source Data Warehousing and Business Intelligence

By Lakshman Bulusu Copyright 2013
    428 Pages 30 B/W Illustrations
    by CRC Press

    428 Pages 30 B/W Illustrations
    by CRC Press

    Open Source Data Warehousing and Business Intelligence is an all-in-one reference for developing open source based data warehousing (DW) and business intelligence (BI) solutions that are business-centric, cross-customer viable, cross-functional, cross-technology based, and enterprise-wide. Considering the entire lifecycle of an open source DW & BI implementation, its comprehensive coverage spans from basic concepts all the way through to customization.

    Highlighting the key differences between open source and vendor DW and BI technologies, the book identifies end-to-end solutions that are scalable, high performance, and stable. It illustrates the practical aspects of implementing and using open source DW and BI technologies to supply you with valuable on-the-project experience that can help you improve implementation and productivity.

    Emphasizing analysis, design, and programming, the text explains best-fit solutions as well as how to maximize ROI. Coverage includes data warehouse design, real-time processing, data integration, presentation services, and real-time reporting. With a focus on real-world applications, the author devotes an entire section to powerful implementation best practices that can help you build customer confidence while saving valuable time, effort, and resources.

    Introduction
    Data Warehousing and Business Intelligence: What, Why, How, When, When Not?
         Taking IT Intelligence to Its Apex
    Open Source DW and BI: Much Ado about Anything-to-Everything DW and BI, When Not, and Why So Much Ado? 
         Taking Business Intelligence to Its Apex: Intelligent Content for Insightful Intent

    Data Warehousing and Business Intelligence: An Open Source Solution
    What Is Open Source DW and BI, and How "Open" Is This Open?
    What’s In, What’s Not: Available and Viable Options for Development and Deployment 
         Semantic Analytics 
         Testing for Optimizing Quality and Automation—Accelerated! 
         Business Rules, Real-World Perspective, Social Context 
         Personalization Through Customizable Measures 
         Leveraging the Cloud for Deployment
    The Foundations Underneath: Architecture, Technologies, and Methodologies 
         Open Source versus Proprietary DW and BI Solutions: Key Differentiators and Integrators
    Open Source DW and BI: Uses and Abuses 
         An Intelligent Query Accelerator Using an Open Cache In, Cache Out Design

    Open Source DW & BI: Successful Players and Products
    Open Source Data Warehousing and Business Intelligence Technology 
         Licensing Models Followed 
         Community versus Commercial Open Source
    The Primary Vendors: Inventors and Presenters 
         Oracle: MySQL Vendor 
         PostgreSQL Vendor 
         Infobright 
         Pentaho: Mondrian Vendor 
         Jedox: Palo Vendor 
         EnterpriseDB Vendor 
          Dynamo BI and Eigenbase: LucidDB Vendor 
         GreenPlum Vendor 
         Hadoop Project 
         HadoopDB 
         Talend
    The Primary Products and Tools Set: Inclusions and Exclusions 
         Open Source Databases 
         Open Source Data Integration
         Open Source Business Intelligence 
         Open Source Business Analytics
    The Primary Users: User, End-User, Customer and Intelligent Customer
         MySQL 
         PostgreSQL
         Mondrian Customers
         Palo Customers 
         EnterpriseDB Customers 
         LucidDB Customers 
         Greenplum Customers 
         Talend Customers
    References

    Analysis, Evaluation, and Selection
    Essential Criteria for Requirements Analysis of an Open Source DW and BI solution
    Key and Critical Deciding Factors in Selecting a Solution 
         The Selection-Action Preview 
         Raising your BIQ: Five Things Your Company Can Do Now
    Evaluation Criteria for Choosing a Vendor-Specific Platform and Solution
    The Final Pick: An Information-Driven, Customer-Centric Solution, and a Best-of-Breed Product/Platform and Solution Convergence Key Indicator Checklist
    References

    Design and Architecture: Technologies and Methodologies by Dissection
    The Primary Aspects of DW and BI from a Usability Perspective: Strategic BI, Pervasive BI, Operational BI, and BI On-Demand
    Design and Architecture Considerations for the Primary BI Perspectives 
         The Case for Architecture as a Precedence Factor
    Information-Centric, Business-Centric, and Customer-Centric Architecture: A Three-in-One Convergence, for Better or Worse
    Open Source DW and BI Architecture 
         Pragmatics and Design Patterns 
         Components
    Why and How an Open Source Architecture Delivers a Better Enterprise-wide Solution
    Open Source Data Architecture: Under the Hood
    Open Source Data Warehouse Architecture: Under the Hood
    Open Source BI Architecture: Under the Hood 
    The Vendor/Platform Product(s)/Tools(s) That Fit into the Open DW and BI Architecture 
         Information Integration, Usability and Management (Across Data Sources, Applications and Business Domains) 
         EDW: Models to Management 
         BI: Models to Interaction to Management to Strategic Business Decision Support (via Analytics and Visualization)
    Best Practices: Use and Reuse

    Operational BI and Open Source
    Why a Separate Chapter on Operational BI and Open Source?
    Operational BI by Dissection
    Design and Architecture Considerations for Operational BI
    Operational BI Data Architecture: Under the Hood
    A Reusable Information Integration Model: From Real- Time to Right Time
    Operational BI Architecture: Under the Hood
    Fitting Open Source Vendor/Platform Product(s)/Tools(s) into the Operational BI Architecture 
         Talend Data Integration 
         expressor 3.0 Community Edition 
         Advanced Analytics Engines for Operational BI 
         Astera’s Centerprise Data Integration Platform 
         Actuate BIRT BI Platform 
         JasperSoft Enterprise 
         Pentaho Enterprise BI Suite
         KNIME (Konstanz Information Miner) 
         Pervasive DataRush
         Pervasive DataCloud2
    Best Practices: Use and Reuse

    Development and Deployment
    Development Options, Dissected
    Deployment Options, Dissected
    Integration Options, Dissected
    Multiple Sources, Multiple Dimensions
    DW and BI Usability and Deployment: Best Solution versus Best-Fit Solution
    Leveraging the Best-Fit Solution: Primary Considerations
    Better, Faster, Easier as the Hitchhiker’s Rule 
         Dynamism and Flash—Real Output in Real Time in the Real World 
         Interactivity 
    Better Responsiveness, User Adoptability, and Transparency
    Fitting the Vendor/Platform Product(s)/tTools(s): A Development and Deployment Standpoint
    Best Practices: Use and Reuse

    Best Practices for Data Management
    Best Fit of Open Source in EDW Implementation
    Best Practices for Using Open Source as a BI-Only Methodology for Data/Information Delivery
         Mobile BI and Pervasive BI
    Best Practices for the Data Lifecycle in a Typical EDW Lifecycle
         Data Quality, Data Profiling, and Data Loss Prevention Components
         The Data Integration Component
    Best Practices for the Information Lifecycle as It Moves into the BI Lifecycle 
         The Data Analysis Component: The Dimensions of Data Analysis in Terms of Online Analytics vs. Predictive Analytics vs. Real-Time Analytics vs. Advanced Analytics 
         Data to Information Transformation and Presentation
    Best Practices for Auditing Data Access, as It Makes Its Way via the EDW and Directly Bypassing the EDW) to the BI Dashboard
    Best Practices for Using XML in the Open Source EDW/BI Space
    Best Practices for a Unified Information Integrity and Security Framework
    Object to Relational Mapping: A Necessity or Just a Convenience? 
         Synchrony Maintenance 
         Dynamic Language Interoperability

    Best Practices for Application Management 
    Using Open Source as an End-to-End Solution Option: How Best a Practice Is It?
    Accelerating Application Development: Choice, Design, and Suitability Aspects 
         Visualization of Content: For Better or Best Fit 
         Best Practices for Autogenerating Code: A Codeless Alternative to Information Presentation 
         Automating Querying: Why and When 
         How Fine Is Fine-Grained? Drawing the Line between Representation of Data at the Lowest Level and a Best-Fit Metadata Design and Presentation
    Best Practices for Application Integrity 
         Sharing Data between EDW and the BI Tiers: Isolation or a Tightrope Methodology 
         Breakthrough BI: Self-Serviceable BI via a Self-Adaptable Solution 
         Data-In, Data-Out Considerations: Data-to-Information I/O 
         Security Inside and Outside Enterprise Parameters: Best Practices for Security beyond User Authentication
    Best Practices for Intra- and Interapplication Integration and Interaction 
         Continuous Activity Monitoring and Event Processing 
         Best Practices to Leverage Cloud-Based Methodologies
    Best Practices for Creative BI Reporting

    Best Practices Beyond Reporting: Driving Business Value
    Advanced Analytics: The Foundation for a Beyond-Reporting Approach (Dynamic KPI, Scorecards, Dynamic Dashboarding, and Adaptive Analytics)
    Large Scale Analytics: Business-centric and Technology-centric Requirements and Solution Options 
         Business-centric Requirements 
         Technology-centric Requirements
    Accelerating Business Analytics: What to Look for, Look at, and Look Beyond
    Delivering Information on Demand and Thereby Performance on Demand
         Design Pragmatics 
         Demo Pragmatics

    EDW/BI Development Frameworks
    From the Big Bang to the Big Data Bang: The Past, Present, and Future
    A Framework for BI Beyond Intelligence 
         Raising the Bar on BI Using Embeddable BI and BI in the Cloud 
         Raising the Bar on BI: Good to Great to Intelligent 
         Raising the Bar on the Social Intelligence Quotient (SIQ) 
         Raising the Bar on BI by Mobilizing BI: BI on the Go
    A Pragmatic Framework for a Customer-Centric EDW/BI Solution
    A Next-Generation BI Framework 
         Taking EDW/BI to the Next Level: An Open Source Model for EDW/BI–EPM 
          Open Source Model for an Open Source DW–BI/EPM Solution Delivering Business Value 
         Open Source Architectural Framework for a Best-Fit Open Source BI/EPM Solution 
         Value Proposition
         The Road Ahead . . .
    A BI Framework for a Reusable Predictive Analytics Model
    A BI Framework for Competitive Intelligence: Time, Technology, and the Evolution of the Intelligent Customer

    Best Practices for Optimization
    Accelerating Application Testing: Choice, Design, and Suitability
    Best Practices for Performance Testing: Online and On Demand Scenarios
    A Fine Tuning Framework for Optimality
    Looking Down the Customer Experience Trail, Leaving the Customer Alone: Customer Feedback Management (CFM)–Driven and APM-Oriented Tuning
    Codeful and Codeless Design Patterns for Business-Savvy and IT-Friendly QOS Measurements and In-Depth Impact Analysis
    Summary

    Open Standards for Open Source: An EDW/BI Outlook
    Summary
    References

    Index

    Each chapter includes an Introduction and Summary

    Biography

    Lakshman Bulusu is a 20-year veteran of the IT industry with specialized expertise and academic experience in the management, supervision, mentoring, review, architectural design, and development of database, data warehousing, and business intelligence-related application development projects encompassing major industry domains such as pharmaceutical/healthcare, telecommunications, news/media, global investment and retail banking, insurance, and retail for clients across the United States, Europe, and Asia. He is well-versed in the primary Oracle technologies through Oracle11g, including SQL, PL/SQL, and SQL-embedded programming, as well as design and development of Web applications that are cross-platform and open source-based.

    Mr. Bulusu has expertise in data modeling and design of enterprise data warehousing/business intelligence information architectures, with multiple customer implementations to his credit. His design of application development frameworks using PL/SQL, from design to coding to testing to debugging to performance tuning to business intelligence, has been implemented in some major Fortune 500 clients in the United States. He has implemented the Common Data Quality Framework for SQL Server, based on summarization-comparison-discrepancy isolation across disparate multivendor large-scale databases. He is also an educator who has been teaching technical courses for about a decade in the areas of Oracle design, development, and optimization, and he serves on the CNS Advisory Committee of Anthem Institute (affiliated to Anthem Education Group).

    Mr. Bulusu has authored six books on Oracle and more than fifty educational/technical articles in journals and magazines in the United States and the United Kingdom; he has also presented at national and international conferences in the United States and the United Kingdom. He lives in New Jersey and likes to read, write, listen to, and lecture on English poetry and nonfiction when he is not working on IT projects. He can be reached at [email protected].

    "… As a practitioner himself, Mr. Bulusu is able to pinpoint the more critical aspects for consideration that a BI expert would truly appreciate. He does a great job of covering the subject thoroughly by first starting at a high-level, covering every aspect of BI and then breaking down the components to the most granular detail applying open source BI’s feasibility and validity as a viable solution. … Bulusu’s thorough, meticulous approach and in-depth examination of the subject, taken from a standpoint of an experienced author, educator, and technologist, has finally provided justice to this subject. Thank you, Mr. Bulusu. You have opened my eyes to open source BI."
    —Rosendo Abellera, President and CEO, BIS3

    "… coverage spans from basic concepts to customization. It identifies end-to-end solutions and illustrates the practical aspects of implementing and using open source DW and BI technologies. The text explains best-fit solutions and how to maximize ROI. The expert in field, Lakshman Bulusu in this book devotes offers the best practices that will help you build customer confidence."
    —NeoPopRealism Journal