Slide 1
Slide 2
The Old
Way:
Fragmented Information Supply Chain
The New Way: Oracle9i
Oracle9i
Complete,
Therefore Simple
Oracle9i Database
Single business-intelligence data server
Oracle9i Application
Server
Runs All Your Business Intelligence Applications
Oracle Database for Data
Warehousing
Continuous Innovation
New Oracle9i RDBMS
features
Extending Oracle’s leadership
|
|
|
|
Automatic Memory Tuning |
|
ETL Infrastructure |
|
Change data capture |
|
External tables |
|
Table functions |
|
Upserts |
|
Multi-table INSERTs |
|
Resumable statements |
|
Transportable tablespace enhancements |
|
List Partitioning |
|
Internal enhancements for: |
|
parallel query |
|
aggregation |
|
cost-based optimization |
|
Bitmap Join Indexes |
|
Analytic SQL fns |
|
Grouping sets |
|
FIRST/LAST aggregates |
|
Inverse distribution |
|
Hypothetical rank |
|
Proactive query governing |
|
Enhancements to MVs |
|
Broader refresh and rewrite
capabilities |
|
More sophisticated summary advisor |
|
Full Outer Joins |
|
WITH-clause |
Real Application
Clusters
(RAC)
Slide 11
Manage Large Volumes of
Data
Managing Large Volumes of
Data
|
|
|
|
Partitioning and parallelism are
crucial for VLDB |
|
Parallelism for all operations |
|
DBA operations: loading,
index-creation, table-creation, data-modification, backup and recovery |
|
End-user operations: Queries |
|
Unbounded scalability: Real Application
Clusters |
|
Partitioning provides ‘incremental’
operations for: |
|
Data loading |
|
Indexing |
|
Referential Integrity |
|
Backup and recovery |
How Parallel Execution
Works
|
|
|
|
With serial execution only one process
is used |
|
With parallel execution |
|
One parllel execution coordinator
process |
|
Many parallel execution servers |
|
Table is dynamically partitioned into
granules |
VLDB Manageability and
Performance Constraints
|
|
|
|
Table availability: |
|
Large tables are more vulnerable to
disk failure |
|
It is too costly to have a large table
inaccessible for hours due to recovery |
|
Large table manageability |
|
They take to long to be loaded |
|
Indexes take too long to be built |
|
Partial deletes take hours |
|
Performance considerations |
|
Large table and index scans are costly |
|
Scanning a subset improves performance |
Benefits of Partioning
|
|
|
|
Availability |
|
Partions can be independently managed |
|
Backup and restore operations can be
done on individual partitions |
|
Partitions that are unavailable do not
affect queries on DML operations on other paritions that use the same table
or index |
|
Manageability |
|
A partition can be moved from one
tablespace to another |
|
A partition can be dropped, truncated,
added |
|
A partition can be divided at
user-defined value |
Benefits of Partioning
(2)
|
|
|
|
Performance |
|
The optimizer eliminates partitions
that not have to be scanned |
|
Partitions can be scanned in parallel |
|
Partitions can be load-balanced across
physical devices |
|
Join operations can be optimized to “join
by the partition” |
Partitioning Methods
|
|
|
Range Partitioning |
|
Hash partitioning |
|
List Partioning |
|
Composite Partioning |
Partitioned Indexes
|
|
|
|
Indexes can be partitioned like tables |
|
Partitioned or nonpartioned indexes can
be used with partitioned or nonpartitioned tables |
|
Partioned indexes can be |
|
Global or local |
|
Prefixed or nonprefixed |
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
Rolling Window Operations
List Partitioning
List Partitioning
List Partitioning
Table Compression:
What is it?
|
|
|
|
Tables can be compressed |
|
Compression can also be specified at
the partition level |
|
Indexes and index-organized tables are
not compressed |
|
Typical compression ratios are 3:1 -
5:1 |
|
Compression is dependent upon the
actual data |
|
Compression algorithm based on removing
data redundancy |
|
All DDL/DML commands are supported on
compressed tables |
|
|
Table
Compression:
What isn’t it?
|
|
|
|
|
This is not a generic ‘zip’-style
compression |
|
Not all tables will have good
compression |
|
Compression algorithm guarantees that
compression will never increase size of table |
|
Most large DW tables seem to compress
well |
|
Compression happens between column/row
values, not within column/row values |
|
Long character strings are not
compressed unless the exact same string appear multiple times |
|
LOB/BLOB columns are not compressed |
Table
Compression:
How it works
|
|
|
<rowid> ‘650-506-7000’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7000’ ‘650-506-7001’ |
|
<rowid> ‘650-506-7000’ ‘650-456-7890’ |
|
<rowid> ‘650-506-7000’ ‘650-098-7654’ |
|
<rowid> ‘650-506-7000’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7001’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7001’ ‘650-123-4567’ |
|
… |
|
|
Table
Compression:
Usage
|
|
|
|
Creating compressed tables: |
|
CREATE TABLE T1(id integer) COMPRESS; |
|
Converting tables to compressed tables: |
|
ALTER TABLE T3 MOVE COMPRESS; |
|
Creating compressed tablespaces: |
|
CREATE TABLESPACE tabspace_2 |
|
DATAFILE 'diska:tabspace_file2.dat' SIZE 20M |
|
DEFAULT COMPRESS STORAGE ( … ); |
Table Compression:
Performance Impact
|
|
|
|
|
Queries on compressed tables may
observe minor performance degradation |
|
Performance impact depends upon the
query |
|
Many queries will be faster |
|
Compression reduces IO but increases
CPU utilization |
|
For a set of heterogeneous queries,
performance should degrade by no more than 5% |
|
Load and direct-path INSERT performance
will be slower |
|
Data must be compressed as it is added
to the table |
|
|
Table Compression:
When to Use it
|
|
|
|
|
Data warehouses containing large
volumes of historical data |
|
Compress all of the older data in a
data warehouse |
|
Integrate compression into the ‘rolling
window’ paradigm |
|
For example, most recent 3 months of
data could be stored uncompressed and the previous 21 months could be stored
compressed |
|
Materialized views and other derived
data sets |
|
Generally, compression should be
applied to data that is infrequently updated |
|
|
Manage large numbers of
concurrent users
Manage large numbers of
users
|
|
|
|
|
Key requirements: |
|
Guarantee optimal resource utilization
all the time |
|
Provide the appropriate amount of
resources to every job or query based on priority and system load |
|
Pro-actively prevent ‘runaway’ queries |
|
Pro-actively prevent system overloading |
|
Managing large numbers of users should
be simple and automated |
Appropriate Resources to
Each Query
|
|
|
|
|
CPU |
|
Business-critical processes receive
more CPU |
|
Database Resource Manager allows DBA to
assign CPU resources to groups of users |
|
Memory |
|
Oracle9i dynamically allocates runtime
memory based on current available memory and each query’s requirements |
|
Parallelism |
|
Degree of parallelism is dynamically
chosen based on available resources and each query’s requirements |
Automatic Runtime Memory
Tuning
|
|
|
|
One parameter: |
|
PGA_AGGREGATE_SIZE = <size> |
|
Dynamic allocation of ‘runtime’ memory
based upon each query’s requirements |
|
In data-warehouse environments, >50%
of a server’s physical memory is typically used for query ‘runtime’ memory |
|
Benefits: |
|
Reduced overall memory usage |
|
Improved throughput |
|
Simplified tuning |
Pro-active management of
DW Workloads
|
|
|
|
Predictive Query Governing and Dynamic
Re-prioritization: |
|
Queries which are estimated to take
longer than an DBA-specified limit will abort or be ‘de-prioritized’ |
|
Automatic Queuing: |
|
A limit can be set on the number of
active session for each group of users; queries submitted which exceed this
limit will be queued |
|
Via Database Resource Manager |
Example Scenario
|
|
|
|
Power Users |
|
Up to 70% of the CPU resources |
|
Any degree of parallelism |
|
Any query which is expected to take
over one hour will be migrated to background |
|
Report Users |
|
Up to 20% of the CPU resources |
|
No parallelism |
|
Limit of 40 concurrent queries |
|
Any query which is expected to take
over 20 minutes will be aborted |
|
Background Jobs |
|
Up to 10% of the CPU resources |
|
Any degree of parallelism |
|
Limit of 5 concurrent queries |
Fast Query Performance
Bitmap Indexes
|
|
|
|
The most common index type in Oracle DW
environments |
|
Bitmap indexes introduced in Oracle 7.3 |
|
Bitmap join indexes introduced in
Oracle9i |
|
Oracle has over a dozen patents for
bitmap index technology |
|
Oracle provides patented compression
technique for bitmap indexes |
|
Bitmap indexes are 3-20x smaller than
b-tree indexes |
|
Less storage yields better query
performance and more indexed columns |
Bitmap indexes
introduction
Bitmap indexes
characteristics
|
|
|
|
Columns with Low-to-Medium Cardinality |
|
‘Set-based’ manipulation of data |
|
Especially good for large, complex
queries |
|
Orders of magnitude performance
improvement |
|
Fully integrated within Oracle9i |
|
Created and managed similar to other
Oracle9i indexes |
|
Used to accelerate single-table access,
joins, and aggregation |
|
Transparently selected by the query
optimizer |
Bitmap join indexes
Bitmap join indexes
Bitmap join indexes
Materialized Views
|
|
|
Currently, indexes provide fast path
access to specific data |
|
Materialized views work on the same
principle |
|
A Materialized view is an instantiation
of a SQL statement - a view with data storage |
|
Materialized views can be partitioned,
indexed separately |
|
Used for query rewrite to increase
performance |
|
Rewrites are transparent to
applications |
|
Rewrites do not require any special
privileges |
SQL support for analytic
calculations
|
|
|
|
Why enhance the RDBMS for analytic
calculations? |
|
Benefits |
|
Performance |
|
Scalability |
|
Simpler SQL development |
Analytic Functions:
Examples
|
|
|
|
Rank |
|
Top 10 sales-reps in each region |
|
Moving Window |
|
Today’s stock price minus 200-day
moving average |
|
Period-over-period comparisons |
|
Percentage growth of Jan-99 sales over Jan-98 |
|
Ratio-to-report |
|
January’s sales as a percentage of the
entire year’s |
Platform for Business
Intelligence:
Data Warehousing
Platform for Business
Intelligence:
ETL
Platform for Business
Intelligence:
OLAP
Oracle9i OLAP Services
Platform for Business
Intelligence:
Data Mining
Slide 55