Written By Nassyam Basha
We all know well how easy to build standby database and configure Data Guard between Primary and Standby database(s) with predefined steps. But have you ever noticed that the current configuration is suitable to the environment? Oracle provides the default values for each parameter but it may not suits the environment and hence you must consider to optimize the Data Guard configuration after reviewing the current configuration. These set of articles explains How to Optimize Data Guard configuration". The first part of article covers Optimizing Data Guard configuration when Physical standby is in place.
We are going to discuss existing configurations and how they are really impacting the performance and based on that we are going to optimize the configuration. This article written for optimizing Data Guard configuration of physical standby database irrespective of synchronous and asynchronous transport. We may not cover all the attributes/parameters/options in one article, so the remaining parameters will be discussed in Part-II.
Optimizing log buffer comes into picture in case of asynchronous redo but the commit transactions are directly written first into standby database in case of synchronous redo and hence it's really not required to concentrate on optimizing and tuning the log buffer. This section doesn't explains about Data Guard architecture but more into LNS process behavior. As we know from 11g in ASYNC mode the LNS process reads data directly from the redo log buffer. If we have sufficient log buffer size to keep the data until LNS process reads. The below picture represents how the LNS reads data directly from the log buffer zone of SGA.
If in case LNS is unable to keep pace then there is chance of data recycled before it transmit to standby database, in such situations LNS process tries to read data from the online redo log files, which means we are giving more scope for the more I/O' calls? We can see the below picture how actually LNS process works in case of reading data from the Online redo log files.
To know whether the log buffer size configured is sufficient or not, then we have various views and tools in order to measure. I have used ASH report and it is clear that there is log buffer event taken place and indeed ash doesn't provide any recommendations for that we can use view "x$logbuf_readhist" from primary database.
Blocking Sid (Inst)
# Samples Active
11, 1( 1)
log buffer space
16/240 [ 7%]
If you would like to check how much amount of redo was read from the log buffer and how much redo read from disk(online redo log files), the below view explains well. in detail it provides log buffer hit rate histogram.
From the above view we can see various values for buffer size and the actual buffer size we can identify with the BUFINFO='CURRENT' and also we can see the Target log buffer. READMEMBLKS refers to the number of blocks read from the log buffer in-memory and RDDISKBLKS refers to the number of blocks read from the disk(online redo log files). From this advisory it is clear that the more less buffer size and the more I/O calls will occur.
Hit ratio is calculated as "100 * RDMEMBLKS / (RDMEMBLKS + RDDISKBLKS)", So for example if we consider row "1" then the hit ratio will be "100* 572121 / (572121+30305)" = "100* 572121/( 602426)" = " 94.96950662819998" which is same value as "HITRATE" of the view. So finally the low hit ratio refers as LNS is reading from disk(online redo log files) and the more hit ratio means it reading from the log buffer in-memory.
We have seen optimizing the log buffer, there are more things to learn on how to optimize the attributes of the log_archive_dest_n, The below section explains all.
REOPEN parameter is optional but still it plays a considerable role in Data Guard, The REOPEN attribute by default comes with the value 300 and that means Primary does not try to reconnect to the standby database if in case any disconnection or any failed remote archive destination of standby with primary database. So our goal with this attribute is to reduce the time to reconnect to standby database if in case unreachable.
REOPEN attribute of log_archive_dest_n can be configured in primary database or FARSYNC instance if applicable. So basically in the source database which sending redo data to remote/standby destinations.
REOPEN word used as attribute when managing Data Guard with SQL* Plus, If Data Guard configuration included with Broker then we need to adjust REOPENSECS from the broker and of course both attributes representing to same attribute.
DGMGRL> show database verbose india 'ReopenSecs'
ReopenSecs = '300'
SQL> select dest_id,target,reopen_secs from v$archive_dest where dest_id=2;
DEST_ID TARGET REOPEN_SECS
---------- ------- -----------
2 STANDBY 300
DGMGRL> edit database india set property ReopenSecs=60;
Property "reopensecs" updated
ReopenSecs = '60'
Please note that if Data Guard is managed with Broker, then you must consider to update values from broker to avoid inconsistency between Broker and database configuration file.
NET_TIMEOUT has the ability to mark standby destination as failed if the number of seconds exceeds when log writer(LGWR) waits for LNS process. NET_TIMEOUT applicable if the transport is synchronous, which means this will work in maximum availability or maximum protection modes. If we specify asynchronous redo transport then NET_TIMEOUT attribute will be ignored. The default value of the attribute is 30 seconds and we can optimize the attribute as required but Oracle recommends to set minimum value from 8 to 10 seconds and of course we can set even for less value but Oracle also needs some time to take breath in order to reestablish all the process. Specifically we need to take care of this attribute settings in case of maximum protection because if any failure over standby then primary instance will be forced to shut down if any of the standby databases are not in contact. This attribute we have to configure at source database and it can be either Primary database or Far sync Instance.
When we specify NetTimeout attribute from Data Guard broker and ultimately it refers to the NET_TIMOUT attribute of the log_archive_dest_n.
SQL> select dest_id,target,net_timeout from v$archive_dest where dest_id=2;
DEST_ID TARGET NET_TIMEOUT
2 STANDBY 30
DGMGRL> show database india NetTimeout;
NetTimeout = '30'
DGMGRL> edit database india set property NetTimeout=12;
Property "nettimeout" updated
NetTimeout = '12'
2 STANDBY 12
This attribute required if the redo transport is with ARCH, hence setting this attribute is no longer required because we are in the generation of Real-time apply and Real time query with Active Data Guard and customers are focusing on the data availability on standby database sooner and faster. Of course still we talk about this attribute how it helpful in case of 10g, If there is any gap between primary and standby databases by setting maxconnections to the value more than 1 then the number of archive processes will work in parallel to send the archive log to remote destinations and the received redo will be consolidated at standby location.
With MaxConnections attribute it conflicts with the LOG_ARCHIVE_MAX_PROCESSES parameter and we need to ensure there is always greater value of LOG_ARCHIVE_MAX_PROCESSES rather than MaxConnections. If Broker is using then we need to update the value for the property "MaxConnections" and if managed using SQL then we need to add attribute max_connections in log_archive_dest_n. Now we will go through with the brief demo how the archive processes at primary and RFS processes at standby works with each sequence if in case of any GAP.
The default value of attribute max_coonnections will be 1, We will increase the value of archive processes and also max_connections.
SQL> select dest_id,target,max_connections from v$archive_dest where dest_id=2;
DEST_ID TARGET MAX_CONNECTIONS
---------- ------- ---------------
2 STANDBY 1
SQL> select program from v$session where program like '%ARC%';
SQL> show parameter log_archive_max_process
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes integer 4
DGMGRL> edit database india set property maxconnections=8;
Property "maxconnections" updated
DGMGRL> show database india maxconnections;
MaxConnections = '8'
DGMGRL> show database canada LogArchiveMaxProcesses;
LogArchiveMaxProcesses = '4'
DGMGRL> edit database canada set property LogArchiveMaxProcesses=10;
Property "logarchivemaxprocesses" updated
LogArchiveMaxProcesses = '10'
10 rows selected.
[root@CKPT-ORA-03 trace]# date
Sat Aug 15 15:43:10 IST 2015
[root@CKPT-ORA-03 trace]# ls -ltr *arc*trc
-rw-r----- 1 oracle oinstall 2102 Aug 15 15:40 mcdb_arc0_25903.trc
-rw-r----- 1 oracle oinstall 1489 Aug 15 15:43 mcdb_arc9_20439.trc
-rw-r----- 1 oracle oinstall 1664 Aug 15 15:43 mcdb_arc8_20437.trc
-rw-r----- 1 oracle oinstall 1664 Aug 15 15:43 mcdb_arc7_20435.trc
-rw-r----- 1 oracle oinstall 1575 Aug 15 15:43 mcdb_arc6_20433.trc
-rw-r----- 1 oracle oinstall 1664 Aug 15 15:43 mcdb_arc5_20431.trc
-rw-r----- 1 oracle oinstall 1577 Aug 15 15:43 mcdb_arc4_20429.trc
-rw-r----- 1 oracle oinstall 4568 Aug 15 15:43 mcdb_arc3_25918.trc
-rw-r----- 1 oracle oinstall 4517 Aug 15 15:43 mcdb_arc2_25916.trc
-rw-r----- 1 oracle oinstall 22045 Aug 15 15:43 mcdb_arc1_25912.trc
So the max_connections now will be 8 and the log_archive_max_processes value will be 10. Now we have forced to have gap on standby by deferring the destination and performed few log switches and hence the GAP will taken place. So whenever we enable the remote destination then the archive process will work in parallel to speed up the transport to the standby site and parallel RFS processes also work from standby site. Again its depends on the attributes/parameter values configured.
[root@CKPT-ORA-04 trace]# ls -ltr *rfs*113*trc
-rw-r----- 1 oracle oinstall 1414 Aug 15 15:43 drmcdb_rfs_11338.trc
-rw-r----- 1 oracle oinstall 1119 Aug 15 15:43 drmcdb_rfs_11347.trc
-rw-r----- 1 oracle oinstall 1540 Aug 15 15:43 drmcdb_rfs_11345.trc
-rw-r----- 1 oracle oinstall 1529 Aug 15 15:43 drmcdb_rfs_11343.trc
-rw-r----- 1 oracle oinstall 1676 Aug 15 15:43 drmcdb_rfs_11340.trc
-rw-r----- 1 oracle oinstall 1800 Aug 15 15:43 drmcdb_rfs_11336.trc
-rw-r----- 1 oracle oinstall 1415 Aug 15 15:43 drmcdb_rfs_11334.trc
-rw-r----- 1 oracle oinstall 1842 Aug 15 15:43 drmcdb_rfs_11332.trc
-rw-r----- 1 oracle oinstall 1800 Aug 15 15:43 drmcdb_rfs_11330.trc
-rw-r----- 1 oracle oinstall 3962 Aug 15 15:43 drmcdb_rfs_11328.trc
[root@CKPT-ORA-04 trace]# grep "sequence 295" *rfs*113*trc
drmcdb_rfs_11330.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11332.trc:RFS: Archival completed for log 0 thread 1 sequence 295 dbid 3793852408 branch 854451579
drmcdb_rfs_11334.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11336.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11338.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11340.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11343.trc:Archivelog header information for thread 1 sequence 295 has
drmcdb_rfs_11345.trc:Archivelog header information for thread 1 sequence 295 has
From the above various trace files of total 8, the work was done on same sequence 295 by different processes. By thus the archive transport will be very quick in case of GAP and this is benefited after configuring the maxconnections and log_archive_max_processes attributes/parameters. Finally it's not recommended to use the same settings from 11g onwards so that it should not impact the redo transport performance.
we've seen how to optimizing few attributes and parameters of Data Guard configuration and so that how they are going to impact performance and availability of databases. Especially how the log buffer will play key role and what changes to be considered in order to I/O with log buffer and also other various configurations. More to come in second part of Optimizing Data Guard configuration when Physical standby database is in place.