In part I & II of this article series, you have learned the basic concepts, advantages & architecture of Exadata Database Machine, its key components (hardware and software), Cell storage architecture & file system, also how to configure storage (physical disks, LUN, Grid disk), & ASM diskgroup with the mandatory disk group parameters.
In this article, you will understand and identify the unique capabilities of Exadata (smart scan, Hybrid Columnar Compression, SQL offloading, storage indexes etc) which puts Exadata a step ahead from its competitors.
One of the toughest challenges every organization and DBA faces is to maintain storage capacity for their large or extremely large sized databases. Although there are many database software and third party vendors offer various data compression solutions, the performance accessing the compressed data in the context was remained a key factor.
In addition to Basic and Advanced Compression features, an Oracle Exadata offers a new and unique method for data compression, i.e, Hybrid Columnar Compression (hereafter referred as HCC). HCC provides the highest level of data compression and delivers great storage savings with remarkable I/O performance. In contrast to the other data compression solutions, the HCC delivers highest storage saving with enormous performance on the Exadata machines. Depends upon the HCC type you are implementing, the storage saving can be in the range of 10x to 50x. HCC brings in a new mechanism to store the compressed data than any traditional compression solutions. HCC utilizes a combination of both row and columnar methods to store data in the logical compression units. A compression unit is a set of blocks which keeps multiple data blocks, as show in the below picture:
HCC can be deployed at table, partition and tablespace level, either with data warehouse compression or archive compression algorithm. Each algorithm offers query high or query low option to deliver storage savings and performance benefits. HCC can’t be deployed on Index and LOB segments. HCC can be applied while creating a new table or an existing tables/partitions with COMPRESSION FOR clause. HCC provides the flexibility to apply different compression methods to individual partitions of a same table. However, it is highly advised to avoid applying compression on the data which is frequently updated. As a consequence, when update is performed on compressed data, the entire compression unit is locked and the rows that are updated will move down to no compression or OLTP compression mode. Take full benefits compression, you load the data using the direct path loads.
· Warehouse compression (COMPRESS FOR QUERY LOW|HIGH) delivers compression with query performance optimization and best suited for warehouse application databases. COMPRESS FOR QUERY HIGH is the default option for warehouse compression mode.
· Archive compression (COMPRESS FOR ARCHIVE LOW|HIGH) provides maximum storage saving benefits for archived data, i.e, infrequent data changes. COMPRESS FOR ARCHIVE LOW is the default option for archive compression mode.
SQL> ALTER TABLE sales MOVE COMPRESS FOR QUERY LOW;
· Compressing existing table and data to warehouse query low
SQL> ALTER TABLE tab_bulk COMPRESS FOR ARCHIVE LOW;
· Compressing existing table for future data
SQL> CREATE TABLE tab_dw_qh COMPRESS FOR QUERY HIGH AS SELECT * from tab_bulk;
· Create a compressed table for Archive high from an existing table
SQL> ALTER TABLE sales MODIFY PARTITION sales_2013 COMPRESS FOR ARCHIVE LOW;
The following syntax is used to decompress/disable the compression:
SQL> ALTER TABLE sales NOCOMPRESS;
SQL> ALTER TABLE sales MOVE NOCOMPRESS;
To determine which type of compression level is applied on table or partition level, use the following query respectively:
SQL> SELECT owner,table_name,compress_for FROM dba_tables WHERE compression= 'ENABLED';
SQL> SELECT owner,table_name,partition_name,compress_for FROM dba_tab_partitions WHERE compression= 'ENABLED';
The following procedure can used to identify what type of compression method is applied and also whether it is compressed or not:
SQL> SELECT rowid,owner,object_name FROM sales;
· Pick the row of your interest to verify the type of compression and if compression is applied or not
SQL> SELECT dbms_compression.get_compression_type('USER', 'TABLENAME', 'ROW_ID') FROM dual;
If the compression_type = 1, then, the row is not compressed. Each number in compression_type, except 1, represents different compression method type.
We recently run through Exadata HCC proof of concepts(POC) to measure the amount of saving we gain from the HCC for one of our new Exadata deployments. We randomly pick some of the large partition tables, applied different types of compression levels. For example, no-compression for current month partition, last 3 months partition with Query high and the rest of the historical partitions compressed with Archive high. Following are the before after HCC storage statistics which we performed as part of Exadata HCC proof of concepts (POC), where we manage to gain maximum storage savings:
Table Name (tables name changed)
Size before compression
Size after compression
Warehouse high & Archive High
Warehouse high & Archive high
We clearly gain over 90% of storage saving, with a very minimal performance impact, which can be easily ignored. HCC is one of the exciting features for data warehousing and large sized databases where significant amount of storage can be saved with great performance.
In a traditional Oracle SQL processing model, when a SELECT statement with predicate value is submitted by the client, the Oracle database engine typically goes through the following SQL retrieving processing action to return the result to the client:
· Maps the request to the file and extents containing the table data
· An i/o is issued to all the related data blocks
· Subsequently all the blocks in the context are read in the memory
· Only the rows satisfied with the predicate values are return to the client
The above process works well and produces good performance in all circumstances with the traditional SQL processing model. However, when queries involved scanning large amount of data blocks on huge tables , it will end-up accessing more blocks to return only a few blocks, thus, results in reading many unnecessary rows into the database memory than the actual rows. When it comes to Exadata database machine, the process is handled in a different way. With SQL/Cell offloading (smart scan) feature, Exadata ensures only rows that satisfy the request are read into memory, which will avoid sending unnecessary data blocks.
Smart scan (sql offloading) is one of the most power features of Exadata. With Exadata smart scan feature, the data search and retrieval process can be offload to the cell server (storage server), and transport only the rows into the memory that satisfy the client request. Smart scan additionally applied on an RMAN incremental backups and fast file creation. The smart scan includes the following capabilities:
· Table and Index scans are performed inside cell storage and transport only interested data to the memory
· Predicate filtering
o SQL> SELECT * FROM emp WHERE sal > 20000;
· Column filtering
o SQL> SELECT eno,ename,sal,dept_no FROM emp;
· Join processing
Query the V$SQLFN_METADATA dictionary view to know the number of function that supports the smart scan (query offloading) features.
SQL> SELECT * FROM v$sqlfn_metadata WHERE offloade='YES';
Smart scan (sql offloading) does support the following conditions:
· On clustered table, index organized tables
· On FFS compressed and reverse key indexes
· CREATE INDEX with NOSORT
· LONG or LOB column involved in the query
· If the table has CACHE clause
· When a query involves more than 255 columns
· Query with predicates on virtual column
Ensure the following attributes are set at the diskgroup level in order data to become eligible for smart scan:
Additionally, the following initialization parameters have the influence and control over smart scan capabilities:
· CELL_OFFLOAD_PROCESSING = TRUE|FALSE
· CELL_OFFLOAD_PLAN_DISPLAY = NEVER|AUTO|ALWAYS
· SQL> SELECT /*+ OPT_PARAM('cell_offload_processing''true') */
· alter session set cell_offload_processing=true;
· Also many _cell_offload and _kcfis hidden parameters contribute to the behavior
The following examples demonstrates how to identify whether smart scan is triggered not:
SQL> EXPLAIN PLAN FOR SELECT * FROM EMP WHERE SAL > 20000;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
If you spot TABLE ACCESS STORAGE FULL or INDEX STORGAE FAST FULL SCAN in the query plan, then the query using the smart scan capabilities. Also, the predicate/filtering will have STORAGE argument.
Exadata also offload RMAN incremental backup processing. During an RMAN incremental backups, Exadata cell storage filters out the blocks that are not required for the incremental backup, send only the blocks to the database that needed an incremental backup, which will greatly improve the backup performance.
File creation operation, such as, CREATE TABLESPACE on Exadata can get extreme benefits with smart scan capabilities.
You can query V$SYSSTAT and V$SESSTAT dynamic views to measure the smart scan statistics to measure the benefits of smart scan on the Exadata machine.
Storage Indexes, one of the Exadata’s unique features, capable of eliminating excessive physical I/O performed in the cell. It is a memory based structure which keep tracks of min/max (8) column values to avoid unnecessary I/O. Each disk in the cell server contains 1MB default storage regions which contains an index entry for every storage region of data stored on the disk. Storage indexes, contains a summary of the data distribution on the disk, are build and maintained automatically in the memory based on the SQL predicate received. Unlike the traditional indexes, storage indexes are purely memory based and doesn’t occupy any physical storage. Exadata will decide whether to build a storage index or not after analyzing the query benefits.
Storage Index is used as part of the smart scans filtering on the storage server, which assist transporting only the rows that are satisfied by the client query. The query must met with the following criteria in order to storage index to be used:
· A statement with a WHERE clause with at least one predicate
· Statement which are smart scan candidates can only be used the storage indexes
· Operators like, =, <, >, BETWEEN, IN, IS NULL, IS NOT NULL, >=, <= can be used to make use of storage indexes
This part of the article series on Exadata explained you the capabilities of Exadata (HCC, Smart Scan, Smart Flash Cahe etc). In the Part IV, I will discuss more on Smart Flash Cache, Cell metrics, how to monitor Exadata on OEM Cloud control and some RMAN best practices.
Get more insights and bigger picture on Exadata in our upcoming Exadata book ‘Oracle Exadata Expert’s Handbook’.
Image courtesy : All images in this article are taken from various Oracle documents/articles/presentations.