|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Automatic Memory Tuning |
|
ETL Infrastructure |
|
Change data capture |
|
External tables |
|
Table functions |
|
Upserts |
|
Multi-table INSERTs |
|
Resumable statements |
|
Transportable tablespace enhancements |
|
List Partitioning |
|
Internal enhancements for: |
|
parallel query |
|
aggregation |
|
cost-based optimization |
|
Bitmap Join Indexes |
|
Analytic SQL fns |
|
Grouping sets |
|
FIRST/LAST aggregates |
|
Inverse distribution |
|
Hypothetical rank |
|
Proactive query governing |
|
Enhancements to MVs |
|
Broader refresh and rewrite capabilities |
|
More sophisticated summary advisor |
|
Full Outer Joins |
|
WITH-clause |
|
|
|
|
|
|
|
|
|
|
|
Partitioning and parallelism are crucial for
VLDB |
|
Parallelism for all operations |
|
DBA operations: loading, index-creation,
table-creation, data-modification, backup and recovery |
|
End-user operations: Queries |
|
Unbounded scalability: Real Application Clusters |
|
Partitioning provides ‘incremental’ operations
for: |
|
Data loading |
|
Indexing |
|
Referential Integrity |
|
Backup and recovery |
|
|
|
|
|
With serial execution only one process is used |
|
With parallel execution |
|
One parllel execution coordinator process |
|
Many parallel execution servers |
|
Table is dynamically partitioned into granules |
|
|
|
|
|
Table availability: |
|
Large tables are more vulnerable to disk failure |
|
It is too costly to have a large table
inaccessible for hours due to recovery |
|
Large table manageability |
|
They take to long to be loaded |
|
Indexes take too long to be built |
|
Partial deletes take hours |
|
Performance considerations |
|
Large table and index scans are costly |
|
Scanning a subset improves performance |
|
|
|
|
|
Availability |
|
Partions can be independently managed |
|
Backup and restore operations can be done on
individual partitions |
|
Partitions that are unavailable do not affect
queries on DML operations on other paritions that use the same table or
index |
|
Manageability |
|
A partition can be moved from one tablespace to
another |
|
A partition can be dropped, truncated, added |
|
A partition can be divided at user-defined value |
|
|
|
|
|
Performance |
|
The optimizer eliminates partitions that not
have to be scanned |
|
Partitions can be scanned in parallel |
|
Partitions can be load-balanced across physical
devices |
|
Join operations can be optimized to “join by the
partition” |
|
|
|
|
Range Partitioning |
|
Hash partitioning |
|
List Partioning |
|
Composite Partioning |
|
|
|
|
|
Indexes can be partitioned like tables |
|
Partitioned or nonpartioned indexes can be used
with partitioned or nonpartitioned tables |
|
Partioned indexes can be |
|
Global or local |
|
Prefixed or nonprefixed |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tables can be compressed |
|
Compression can also be specified at the
partition level |
|
Indexes and index-organized tables are not
compressed |
|
Typical compression ratios are 3:1 - 5:1 |
|
Compression is dependent upon the actual data |
|
Compression algorithm based on removing data
redundancy |
|
All DDL/DML commands are supported on compressed
tables |
|
|
|
|
|
|
|
|
This is not a generic ‘zip’-style compression |
|
Not all tables will have good compression |
|
Compression algorithm guarantees that
compression will never increase size of table |
|
Most large DW tables seem to compress well |
|
Compression happens between column/row values, not
within column/row values |
|
Long character strings are not compressed unless
the exact same string appear multiple times |
|
LOB/BLOB columns are not compressed |
|
|
|
|
<rowid> ‘650-506-7000’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7000’ ‘650-506-7001’ |
|
<rowid> ‘650-506-7000’ ‘650-456-7890’ |
|
<rowid> ‘650-506-7000’ ‘650-098-7654’ |
|
<rowid> ‘650-506-7000’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7001’ ‘650-123-4567’ |
|
<rowid> ‘650-506-7001’ ‘650-123-4567’ |
|
… |
|
|
|
|
|
|
|
Creating compressed tables: |
|
CREATE TABLE T1(id integer) COMPRESS; |
|
Converting tables to compressed tables: |
|
ALTER TABLE T3 MOVE COMPRESS; |
|
Creating compressed tablespaces: |
|
CREATE TABLESPACE tabspace_2 |
|
DATAFILE 'diska:tabspace_file2.dat' SIZE 20M |
|
DEFAULT COMPRESS STORAGE ( … ); |
|
|
|
|
|
|
Queries on compressed tables may observe minor
performance degradation |
|
Performance impact depends upon the query |
|
Many queries will be faster |
|
Compression reduces IO but increases CPU
utilization |
|
For a set of heterogeneous queries, performance
should degrade by no more than 5% |
|
Load and direct-path INSERT performance will be
slower |
|
Data must be compressed as it is added to the
table |
|
|
|
|
|
|
|
|
Data warehouses containing large volumes of
historical data |
|
Compress all of the older data in a data
warehouse |
|
Integrate compression into the ‘rolling window’
paradigm |
|
For example, most recent 3 months of data could
be stored uncompressed and the previous 21 months could be stored
compressed |
|
Materialized views and other derived data sets |
|
Generally, compression should be applied to data
that is infrequently updated |
|
|
|
|
|
|
|
|
|
|
Key requirements: |
|
Guarantee optimal resource utilization all the
time |
|
Provide the appropriate amount of resources to
every job or query based on priority and system load |
|
Pro-actively prevent ‘runaway’ queries |
|
Pro-actively prevent system overloading |
|
Managing large numbers of users should be simple
and automated |
|
|
|
|
|
|
CPU |
|
Business-critical processes receive more CPU |
|
Database Resource Manager allows DBA to assign
CPU resources to groups of users |
|
Memory |
|
Oracle9i dynamically allocates runtime memory
based on current available memory and each query’s requirements |
|
Parallelism |
|
Degree of parallelism is dynamically chosen
based on available resources and each query’s requirements |
|
|
|
|
|
One parameter: |
|
PGA_AGGREGATE_SIZE = <size> |
|
Dynamic allocation of ‘runtime’ memory based
upon each query’s requirements |
|
In data-warehouse environments, >50% of a
server’s physical memory is typically used for query ‘runtime’ memory |
|
Benefits: |
|
Reduced overall memory usage |
|
Improved throughput |
|
Simplified tuning |
|
|
|
|
|
Predictive Query Governing and Dynamic
Re-prioritization: |
|
Queries which are estimated to take longer than
an DBA-specified limit will abort or be ‘de-prioritized’ |
|
Automatic Queuing: |
|
A limit can be set on the number of active
session for each group of users; queries submitted which exceed this limit
will be queued |
|
Via Database Resource Manager |
|
|
|
|
|
Power Users |
|
Up to 70% of the CPU resources |
|
Any degree of parallelism |
|
Any query which is expected to take over one
hour will be migrated to background |
|
Report Users |
|
Up to 20% of the CPU resources |
|
No parallelism |
|
Limit of 40 concurrent queries |
|
Any query which is expected to take over 20
minutes will be aborted |
|
Background Jobs |
|
Up to 10% of the CPU resources |
|
Any degree of parallelism |
|
Limit of 5 concurrent queries |
|
|
|
|
|
|
|
The most common index type in Oracle DW
environments |
|
Bitmap indexes introduced in Oracle 7.3 |
|
Bitmap join indexes introduced in Oracle9i |
|
Oracle has over a dozen patents for bitmap index
technology |
|
Oracle provides patented compression technique
for bitmap indexes |
|
Bitmap indexes are 3-20x smaller than b-tree
indexes |
|
Less storage yields better query performance and
more indexed columns |
|
|
|
|
|
|
|
Columns with Low-to-Medium Cardinality |
|
‘Set-based’ manipulation of data |
|
Especially good for large, complex queries |
|
Orders of magnitude performance improvement |
|
Fully integrated within Oracle9i |
|
Created and managed similar to other Oracle9i
indexes |
|
Used to accelerate single-table access, joins,
and aggregation |
|
Transparently selected by the query optimizer |
|
|
|
|
|
|
|
|
|
|
Currently, indexes provide fast path access to
specific data |
|
Materialized views work on the same principle |
|
A Materialized view is an instantiation of a SQL
statement - a view with data storage |
|
Materialized views can be partitioned, indexed
separately |
|
Used for query rewrite to increase performance |
|
Rewrites are transparent to applications |
|
Rewrites do not require any special privileges |
|
|
|
|
|
Why enhance the RDBMS for analytic calculations? |
|
Benefits |
|
Performance |
|
Scalability |
|
Simpler SQL development |
|
|
|
|
|
Rank |
|
Top 10 sales-reps in each region |
|
Moving Window |
|
Today’s stock price minus 200-day moving average |
|
Period-over-period comparisons |
|
Percentage growth of Jan-99 sales over Jan-98 |
|
Ratio-to-report |
|
January’s sales as a percentage of the entire
year’s |
|
|
|
|
|
|
|
|
|
|
|
|
|