New Oracle9i RDBMS features
Extending Oracle’s leadership

Automatic Memory Tuning

ETL Infrastructure

Change data capture

External tables

Table functions

Upserts

Multi-table INSERTs

Resumable statements

Transportable tablespace enhancements

List Partitioning

Internal enhancements for:

parallel query

aggregation

cost-based optimization

Bitmap Join Indexes

Analytic SQL fns

Grouping sets

FIRST/LAST aggregates

Inverse distribution

Hypothetical rank

Proactive query governing

Enhancements to MVs

Broader refresh and rewrite capabilities

More sophisticated summary advisor

Full Outer Joins

WITH-clause

Slide 11

Managing Large Volumes of Data

Partitioning and parallelism are crucial for VLDB

Parallelism for all operations

DBA operations: loading, index-creation, table-creation, data-modification, backup and recovery

End-user operations: Queries

Unbounded scalability: Real Application Clusters

Partitioning provides ‘incremental’ operations for:

Data loading

Indexing

Referential Integrity

Backup and recovery

How Parallel Execution Works

With serial execution only one process is used

With parallel execution

One parllel execution coordinator process

Many parallel execution servers

Table is dynamically partitioned into granules

VLDB Manageability and Performance Constraints

Table availability:

Large tables are more vulnerable to disk failure

It is too costly to have a large table inaccessible for hours due to recovery

Large table manageability

They take to long to be loaded

Indexes take too long to be built

Partial deletes take hours

Performance considerations

Large table and index scans are costly

Scanning a subset improves performance

Benefits of Partioning

Availability

Partions can be independently managed

Backup and restore operations can be done on individual partitions

Partitions that are unavailable do not affect queries on DML operations on other paritions that use the same table or index

Manageability

A partition can be moved from one tablespace to another

A partition can be dropped, truncated, added

A partition can be divided at user-defined value

Benefits of Partioning (2)

Performance

The optimizer eliminates partitions that not have to be scanned

Partitions can be scanned in parallel

Partitions can be load-balanced across physical devices

Join operations can be optimized to “join by the partition”

Partitioning Methods

Range Partitioning

Hash partitioning

List Partioning

Composite Partioning

Partitioned Indexes

Indexes can be partitioned like tables

Partitioned or nonpartioned indexes can be used with partitioned or nonpartitioned tables

Partioned indexes can be

Global or local

Prefixed or nonprefixed

Rolling Window Operations

List Partitioning

Table Compression:
What is it?

Tables can be compressed

Compression can also be specified at the partition level

Indexes and index-organized tables are not compressed

Typical compression ratios are 3:1 - 5:1

Compression is dependent upon the actual data

Compression algorithm based on removing data redundancy

All DDL/DML commands are supported on compressed tables

Table Compression:
What isn’t it?

This is not a generic ‘zip’-style compression

Not all tables will have good compression

Compression algorithm guarantees that compression will never increase size of table

Most large DW tables seem to compress well

Compression happens between column/row values, not within column/row values

Long character strings are not compressed unless the exact same string appear multiple times

LOB/BLOB columns are not compressed

Table Compression:
How it works

<rowid> ‘650-506-7000’ ‘650-123-4567’

<rowid> ‘650-506-7000’ ‘650-506-7001’

<rowid> ‘650-506-7000’ ‘650-456-7890’

<rowid> ‘650-506-7000’ ‘650-098-7654’

<rowid> ‘650-506-7000’ ‘650-123-4567’

<rowid> ‘650-506-7001’ ‘650-123-4567’

<rowid> ‘650-506-7001’ ‘650-123-4567’

…

Table Compression:
Usage

Creating compressed tables:

CREATE TABLE T1(id integer) COMPRESS;

Converting tables to compressed tables:

ALTER TABLE T3 MOVE COMPRESS;

Creating compressed tablespaces:

CREATE TABLESPACE tabspace_2

DATAFILE 'diska:tabspace_file2.dat' SIZE 20M

DEFAULT COMPRESS STORAGE ( … );

Table Compression:
Performance Impact

Queries on compressed tables may observe minor performance degradation

Performance impact depends upon the query

Many queries will be faster

Compression reduces IO but increases CPU utilization

For a set of heterogeneous queries, performance should degrade by no more than 5%

Load and direct-path INSERT performance will be slower

Data must be compressed as it is added to the table

Table Compression:
When to Use it

Data warehouses containing large volumes of historical data

Compress all of the older data in a data warehouse

Integrate compression into the ‘rolling window’ paradigm

For example, most recent 3 months of data could be stored uncompressed and the previous 21 months could be stored compressed

Materialized views and other derived data sets

Generally, compression should be applied to data that is infrequently updated

Manage large numbers of
concurrent users

Manage large numbers of users

Key requirements:

Guarantee optimal resource utilization all the time

Provide the appropriate amount of resources to every job or query based on priority and system load

Pro-actively prevent ‘runaway’ queries

Pro-actively prevent system overloading

Managing large numbers of users should be simple and automated

Appropriate Resources to Each Query

CPU

Business-critical processes receive more CPU

Database Resource Manager allows DBA to assign CPU resources to groups of users

Memory

Oracle9i dynamically allocates runtime memory based on current available memory and each query’s requirements

Parallelism

Degree of parallelism is dynamically chosen based on available resources and each query’s requirements

Automatic Runtime Memory Tuning

One parameter:

PGA_AGGREGATE_SIZE = <size>

Dynamic allocation of ‘runtime’ memory based upon each query’s requirements

In data-warehouse environments, >50% of a server’s physical memory is typically used for query ‘runtime’ memory

Benefits:

Reduced overall memory usage

Improved throughput

Simplified tuning

Pro-active management of DW Workloads

Predictive Query Governing and Dynamic Re-prioritization:

Queries which are estimated to take longer than an DBA-specified limit will abort or be ‘de-prioritized’

Automatic Queuing:

A limit can be set on the number of active session for each group of users; queries submitted which exceed this limit will be queued

Via Database Resource Manager

Example Scenario

Power Users

Up to 70% of the CPU resources

Any degree of parallelism

Any query which is expected to take over one hour will be migrated to background

Report Users

Up to 20% of the CPU resources

No parallelism

Limit of 40 concurrent queries

Any query which is expected to take over 20 minutes will be aborted

Background Jobs

Up to 10% of the CPU resources

Any degree of parallelism

Limit of 5 concurrent queries

Fast Query Performance

Bitmap Indexes

The most common index type in Oracle DW environments

Bitmap indexes introduced in Oracle 7.3

Bitmap join indexes introduced in Oracle9i

Oracle has over a dozen patents for bitmap index technology

Oracle provides patented compression technique for bitmap indexes

Bitmap indexes are 3-20x smaller than b-tree indexes

Less storage yields better query performance and more indexed columns

Bitmap indexes introduction

Bitmap indexes characteristics

Columns with Low-to-Medium Cardinality

‘Set-based’ manipulation of data

Especially good for large, complex queries

Orders of magnitude performance improvement

Fully integrated within Oracle9i

Created and managed similar to other Oracle9i indexes

Used to accelerate single-table access, joins, and aggregation

Transparently selected by the query optimizer

Bitmap join indexes

Materialized Views

Currently, indexes provide fast path access to specific data

Materialized views work on the same principle

A Materialized view is an instantiation of a SQL statement - a view with data storage

Materialized views can be partitioned, indexed separately

Used for query rewrite to increase performance

Rewrites are transparent to applications

Rewrites do not require any special privileges

SQL support for analytic calculations

Why enhance the RDBMS for analytic calculations?

Benefits

Performance

Scalability

Simpler SQL development

Analytic Functions: Examples

Rank

Top 10 sales-reps in each region

Moving Window

Today’s stock price minus 200-day moving average

Period-over-period comparisons

Percentage growth of Jan-99 sales over Jan-98

Ratio-to-report

January’s sales as a percentage of the entire year’s

Platform for Business Intelligence:
Data Warehousing

Platform for Business Intelligence:
ETL

Platform for Business Intelligence:
OLAP

Oracle9i OLAP Services

Platform for Business Intelligence:
Data Mining

Slide 55


	Automatic Memory Tuning
	ETL Infrastructure
		Change data capture
		External tables
		Table functions
		Upserts
		Multi-table INSERTs
		Resumable statements
		Transportable tablespace enhancements
	List Partitioning
	Internal enhancements for:
		parallel query
		aggregation
		cost-based optimization
	Bitmap Join Indexes
	Analytic SQL fns
		Grouping sets
		FIRST/LAST aggregates
		Inverse distribution
		Hypothetical rank
	Proactive query governing
	Enhancements to MVs
		Broader refresh and rewrite capabilities
		More sophisticated summary advisor
	Full Outer Joins
	WITH-clause


	Partitioning and parallelism are crucial for VLDB
	Parallelism for all operations
		DBA operations: loading, index-creation, table-creation, data-modification, backup and recovery
		End-user operations: Queries
		Unbounded scalability: Real Application Clusters
	Partitioning provides ‘incremental’ operations for:
		Data loading
		Indexing
		Referential Integrity
		Backup and recovery


	With serial execution only one process is used
	With parallel execution
		One parllel execution coordinator process
		Many parallel execution servers
		Table is dynamically partitioned into granules


	Table availability:
		Large tables are more vulnerable to disk failure
		It is too costly to have a large table inaccessible for hours due to recovery
	Large table manageability
		They take to long to be loaded
		Indexes take too long to be built
		Partial deletes take hours
	Performance considerations
		Large table and index scans are costly
		Scanning a subset improves performance


	Availability
		Partions can be independently managed
		Backup and restore operations can be done on individual partitions
		Partitions that are unavailable do not affect queries on DML operations on other paritions that use the same table or index
	Manageability
		A partition can be moved from one tablespace to another
		A partition can be dropped, truncated, added
		A partition can be divided at user-defined value


	Performance
		The optimizer eliminates partitions that not have to be scanned
		Partitions can be scanned in parallel
		Partitions can be load-balanced across physical devices
		Join operations can be optimized to “join by the partition”


	Range Partitioning
	Hash partitioning
	List Partioning
	Composite Partioning


	Indexes can be partitioned like tables
	Partitioned or nonpartioned indexes can be used with partitioned or nonpartitioned tables
	Partioned indexes can be
		Global or local
		Prefixed or nonprefixed


	Tables can be compressed
		Compression can also be specified at the partition level
		Indexes and index-organized tables are not compressed
	Typical compression ratios are 3:1 - 5:1
		Compression is dependent upon the actual data
		Compression algorithm based on removing data redundancy
	All DDL/DML commands are supported on compressed tables


This is not a generic ‘zip’-style compression
	Not all tables will have good compression
		Compression algorithm guarantees that compression will never increase size of table
	Most large DW tables seem to compress well
Compression happens between column/row values, not within column/row values
	Long character strings are not compressed unless the exact same string appear multiple times
LOB/BLOB columns are not compressed


	<rowid> ‘650-506-7000’ ‘650-123-4567’
	<rowid> ‘650-506-7000’ ‘650-506-7001’
	<rowid> ‘650-506-7000’ ‘650-456-7890’
	<rowid> ‘650-506-7000’ ‘650-098-7654’
	<rowid> ‘650-506-7000’ ‘650-123-4567’
	<rowid> ‘650-506-7001’ ‘650-123-4567’
	<rowid> ‘650-506-7001’ ‘650-123-4567’
	…


	Creating compressed tables:
		CREATE TABLE T1(id integer) COMPRESS;
	Converting tables to compressed tables:
		ALTER TABLE T3 MOVE COMPRESS;
	Creating compressed tablespaces:
		CREATE TABLESPACE tabspace_2
		DATAFILE 'diska:tabspace_file2.dat' SIZE 20M
		DEFAULT COMPRESS STORAGE ( … );


Queries on compressed tables may observe minor performance degradation
	Performance impact depends upon the query
		Many queries will be faster
	Compression reduces IO but increases CPU utilization
	For a set of heterogeneous queries, performance should degrade by no more than 5%
Load and direct-path INSERT performance will be slower
	Data must be compressed as it is added to the table


Data warehouses containing large volumes of historical data
	Compress all of the older data in a data warehouse
	Integrate compression into the ‘rolling window’ paradigm
		For example, most recent 3 months of data could be stored uncompressed and the previous 21 months could be stored compressed
Materialized views and other derived data sets
Generally, compression should be applied to data that is infrequently updated


Key requirements:
	Guarantee optimal resource utilization all the time
		Provide the appropriate amount of resources to every job or query based on priority and system load
	Pro-actively prevent ‘runaway’ queries
	Pro-actively prevent system overloading
Managing large numbers of users should be simple and automated


CPU
	Business-critical processes receive more CPU
		Database Resource Manager allows DBA to assign CPU resources to groups of users
Memory
	Oracle9i dynamically allocates runtime memory based on current available memory and each query’s requirements
Parallelism
	Degree of parallelism is dynamically chosen based on available resources and each query’s requirements


	One parameter:
		PGA_AGGREGATE_SIZE = <size>
	Dynamic allocation of ‘runtime’ memory based upon each query’s requirements
		In data-warehouse environments, >50% of a server’s physical memory is typically used for query ‘runtime’ memory
	Benefits:
		Reduced overall memory usage
		Improved throughput
		Simplified tuning


	Predictive Query Governing and Dynamic Re-prioritization:
		Queries which are estimated to take longer than an DBA-specified limit will abort or be ‘de-prioritized’
	Automatic Queuing:
		A limit can be set on the number of active session for each group of users; queries submitted which exceed this limit will be queued
	Via Database Resource Manager


	Power Users
		Up to 70% of the CPU resources
		Any degree of parallelism
		Any query which is expected to take over one hour will be migrated to background
	Report Users
		Up to 20% of the CPU resources
		No parallelism
		Limit of 40 concurrent queries
		Any query which is expected to take over 20 minutes will be aborted
	Background Jobs
		Up to 10% of the CPU resources
		Any degree of parallelism
		Limit of 5 concurrent queries


	The most common index type in Oracle DW environments
		Bitmap indexes introduced in Oracle 7.3
		Bitmap join indexes introduced in Oracle9i
		Oracle has over a dozen patents for bitmap index technology
	Oracle provides patented compression technique for bitmap indexes
		Bitmap indexes are 3-20x smaller than b-tree indexes
		Less storage yields better query performance and more indexed columns


	Columns with Low-to-Medium Cardinality
	‘Set-based’ manipulation of data
	Especially good for large, complex queries
		Orders of magnitude performance improvement
	Fully integrated within Oracle9i
		Created and managed similar to other Oracle9i indexes
		Used to accelerate single-table access, joins, and aggregation
		Transparently selected by the query optimizer


	Currently, indexes provide fast path access to specific data
	Materialized views work on the same principle
	A Materialized view is an instantiation of a SQL statement - a view with data storage
	Materialized views can be partitioned, indexed separately
	Used for query rewrite to increase performance
	Rewrites are transparent to applications
	Rewrites do not require any special privileges


	Why enhance the RDBMS for analytic calculations?
	Benefits
		Performance
		Scalability
		Simpler SQL development


	Rank
		Top 10 sales-reps in each region
	Moving Window
		Today’s stock price minus 200-day moving average
	Period-over-period comparisons
		Percentage growth of Jan-99 sales over Jan-98
	Ratio-to-report
		January’s sales as a percentage of the entire year’s