Notes
Outline
Slide 1
Slide 2
The Old Way:
Fragmented Information Supply Chain
The New Way: Oracle9i
Oracle9i
Complete, Therefore Simple
Oracle9i Database
Single business-intelligence data server
Oracle9i Application Server
Runs All Your Business Intelligence Applications
Oracle Database for Data Warehousing
Continuous Innovation
New Oracle9i RDBMS features
Extending Oracle’s leadership
Automatic Memory Tuning
ETL Infrastructure
Change data capture
External tables
Table functions
Upserts
Multi-table INSERTs
Resumable statements
Transportable tablespace enhancements
List Partitioning
Internal enhancements for:
parallel query
aggregation
cost-based optimization
Bitmap Join Indexes
Analytic SQL fns
Grouping sets
FIRST/LAST aggregates
Inverse distribution
Hypothetical rank
Proactive query governing
Enhancements to MVs
Broader refresh and rewrite capabilities
More sophisticated summary advisor
Full Outer Joins
WITH-clause
Real Application Clusters
(RAC)
Slide 11
Manage Large Volumes of Data
Managing Large Volumes of Data
Partitioning and parallelism are crucial for VLDB
Parallelism for all operations
DBA operations: loading, index-creation, table-creation, data-modification, backup and recovery
End-user operations: Queries
Unbounded scalability: Real Application Clusters
Partitioning provides ‘incremental’ operations for:
Data loading
Indexing
Referential Integrity
Backup and recovery
How Parallel Execution Works
With serial execution only one process is used
With parallel execution
One parllel execution coordinator process
Many parallel execution servers
Table is dynamically partitioned into granules
VLDB Manageability and Performance Constraints
Table availability:
Large tables are more vulnerable to disk failure
It is too costly to have a large table inaccessible for hours due to recovery
Large table manageability
They take to long to be loaded
Indexes take too long to be built
Partial deletes take hours
Performance considerations
Large table and index scans are costly
Scanning a subset improves performance
Benefits of Partioning
Availability
Partions can be independently managed
Backup and restore operations can be done on individual partitions
Partitions that are unavailable do not affect queries on DML operations on other paritions that use the same table or index
Manageability
A partition can be moved from one tablespace to another
A partition can be dropped, truncated, added
A partition can be divided at user-defined value
Benefits of Partioning (2)
Performance
The optimizer eliminates partitions that not have to be scanned
Partitions can be scanned in parallel
Partitions can be load-balanced across physical devices
Join operations can be optimized to “join by the partition”
Partitioning Methods
Range Partitioning
Hash partitioning
List Partioning
Composite Partioning
Partitioned Indexes
Indexes can be partitioned like tables
Partitioned or nonpartioned indexes can be used with partitioned or nonpartitioned tables
Partioned indexes can be
Global or local
Prefixed or nonprefixed
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
List Partitioning
List Partitioning
List Partitioning
Table Compression:
What is it?
Tables can be compressed
Compression can also be specified at the partition level
Indexes and index-organized tables are not compressed
Typical compression ratios are 3:1 - 5:1
Compression is dependent upon the actual data
Compression algorithm based on removing data redundancy
All DDL/DML commands are supported on compressed tables
Table Compression:
What isn’t it?
This is not a generic ‘zip’-style compression
Not all tables will have good compression
Compression algorithm guarantees that compression will never increase size of table
Most large DW tables seem to compress well
Compression happens between column/row values, not within column/row values
Long character strings are not compressed unless the exact same string appear multiple times
LOB/BLOB columns are not compressed
Table Compression:
How it works
<rowid> ‘650-506-7000’ ‘650-123-4567’
<rowid> ‘650-506-7000’ ‘650-506-7001’
<rowid> ‘650-506-7000’ ‘650-456-7890’
<rowid> ‘650-506-7000’ ‘650-098-7654’
<rowid> ‘650-506-7000’ ‘650-123-4567’
<rowid> ‘650-506-7001’ ‘650-123-4567’
<rowid> ‘650-506-7001’ ‘650-123-4567’
…
Table Compression:
Usage
Creating compressed tables:
CREATE TABLE T1(id integer) COMPRESS;
Converting tables to compressed tables:
ALTER TABLE T3 MOVE COMPRESS;
Creating compressed tablespaces:
CREATE TABLESPACE tabspace_2
  DATAFILE 'diska:tabspace_file2.dat' SIZE 20M
  DEFAULT COMPRESS STORAGE ( … );
Table Compression:
Performance Impact
Queries on compressed tables may observe minor performance degradation
Performance impact depends upon the query
Many queries will be faster
Compression reduces IO but increases CPU utilization
For a set of heterogeneous queries, performance should degrade by no more than 5%
Load and direct-path INSERT performance will be slower
Data must be compressed as it is added to the table
Table Compression:
When to Use it
Data warehouses containing large volumes of historical data
Compress all of the older data in a data warehouse
Integrate compression into the ‘rolling window’ paradigm
For example, most recent 3 months of data could be stored uncompressed and the previous 21 months could be stored compressed
Materialized views and other derived data sets
Generally, compression should be applied to data that is infrequently updated
Manage large numbers of
concurrent users
Manage large numbers of users
Key requirements:
Guarantee optimal resource utilization all the time
Provide the appropriate amount of resources to every job or query based on priority and system load
Pro-actively prevent ‘runaway’ queries
Pro-actively prevent system overloading
Managing large numbers of users should be simple and automated
Appropriate Resources to Each Query
CPU
Business-critical processes receive more CPU
Database Resource Manager allows DBA to assign CPU resources to groups of users
Memory
Oracle9i dynamically allocates runtime memory based on current available memory and each query’s requirements
Parallelism
Degree of parallelism is dynamically chosen based on available resources and each query’s requirements
Automatic Runtime Memory Tuning
One parameter:
PGA_AGGREGATE_SIZE = <size>
Dynamic allocation of ‘runtime’ memory based upon each query’s requirements
In data-warehouse environments, >50% of a server’s physical memory is typically used for query ‘runtime’ memory
Benefits:
Reduced overall memory usage
Improved throughput
Simplified tuning
Pro-active management of DW Workloads
Predictive Query Governing and Dynamic Re-prioritization:
Queries which are estimated to take longer than an DBA-specified limit will abort or be ‘de-prioritized’
Automatic Queuing:
A limit can be set on the number of active session for each group of users; queries submitted which exceed this limit will be queued
Via Database Resource Manager
Example Scenario
Power Users
Up to 70% of the CPU resources
Any degree of parallelism
Any query which is expected to take over one hour will be migrated to background
Report Users
Up to 20% of the CPU resources
No parallelism
Limit of 40 concurrent queries
Any query which is expected to take over 20 minutes will be aborted
Background Jobs
Up to 10% of the CPU resources
Any degree of parallelism
Limit of 5 concurrent queries
Fast Query Performance
Bitmap Indexes
The most common index type in Oracle DW environments
Bitmap indexes introduced in Oracle 7.3
Bitmap join indexes introduced in Oracle9i
Oracle has over a dozen patents for bitmap index technology
Oracle provides patented compression technique for bitmap indexes
Bitmap indexes are 3-20x smaller than b-tree indexes
Less storage yields better query performance and more indexed columns
Bitmap indexes introduction
Bitmap indexes characteristics
Columns with Low-to-Medium Cardinality
‘Set-based’ manipulation of data
Especially good for large, complex queries
Orders of magnitude performance improvement
Fully integrated within Oracle9i
Created and managed similar to other Oracle9i indexes
Used to accelerate single-table access, joins, and aggregation
Transparently selected by the query optimizer
Bitmap join indexes
Bitmap join indexes
Bitmap join indexes
Materialized Views
Currently, indexes provide fast path access to specific data
Materialized views work on the same principle
A Materialized view is an instantiation of a SQL statement - a view with data storage
Materialized views can be partitioned, indexed separately
Used for query rewrite to increase performance
Rewrites are transparent to applications
Rewrites do not require any special privileges
SQL support for analytic calculations
Why enhance the RDBMS for analytic calculations?
Benefits
Performance
Scalability
Simpler SQL development
Analytic Functions: Examples
Rank
Top 10 sales-reps in each region
Moving Window
Today’s stock price minus 200-day moving average
Period-over-period comparisons
Percentage growth of  Jan-99 sales over Jan-98
Ratio-to-report
January’s sales as a percentage of the entire year’s
Platform for Business Intelligence:
Data Warehousing
Platform for Business Intelligence:
ETL
Platform for Business Intelligence:
OLAP
Oracle9i OLAP Services
Platform for Business Intelligence:
Data Mining
Slide 55