This article is focused on Monitoring the configuration of Data Guard with traditional method i.e. SQL, Here we learn what are the things we need to verify and gather information to justify the current Data Guard and Active Data Guard configuration is valid or not. This article does not cover monitoring with Data Guard broker or Enterprise manager and it is purely traditional method.
From the initial release of the Data Guard that is called standby database from 8i, we know that many features are introduced in Data Guard until 12c(now) but evaluating the configuration technique wasn't changed enormously but indeed we need to review few parameters that are newly added or deprecated for the new release/version. For example if you have configured Real-Time apply and we need to ensure it is working as expected, if not, in the end we are responsible for the data loss in case of disaster. This article suitable from 11g version because this topic is going to touch Active Data Guard as well.
This article plays around the configuration of parameters, what are the views you have to use to justify whether the configuration is valid or not. We know that for every version to version, release to release many features are introduced and at the same time few Data Guard parameters are deprecated and they are not going to affect any more.
If someone asked you to check the Data Guard configuration, there may be several ways such as Enterprise Manager, Data Guard Broker. These methods are easy to use and justify that your configuration is very much good, The verification of configuration can be anything like Data Guard complete status, whether Real-time apply enabled or not, with ADG feature whether the transactions are applying on standby as expected or not. But when it comes to traditional method you have to put much efforts by considering to review the configuration,using views or parameter values.
On the other note, Monitoring the existing configuration is different and Troubleshooting the configuration or various features (Ex: COMPRESSION) are entirely different, troubleshooting the issues with physical standby and logical standby will be covered in next articles.
In the previous or early releases Data Guard had more parameters to configure which are depreciated, they are STANDBY_ARCHIVE_DEST, FAL_CLIENT, LOG_ARCHIVE_START, REMOTE_ARCHIVE_ENABLE, LOCK_NAME_SPACE and more. Now it is more simplified with very less efforts we can build standby database for primary and also can be managed easily.
The very first check is to review the destination settings and the more important is valid_for attributes, I can see lot of confusion in various forums which one is to use and which one is accurate. The explanation will be easy if we see them how practically it works. I do have a Primary database, Far Sync instance and Physical standby database, the redo will not transform from primary to standby directly but first it will be received to Far Sync with synchronously and then asynchronously far sync will ship the redo to the physical standby database.
If there is no problem in configuration of standby then VALID_NOW refers to YES which means the destination is very much valid for archival operations.
You may need to consider it as severe issue if the VALID_NOW is not YES.
Example 1: Below image from the primary database the VALID_NOW status is "WRONG VALID_ROLE", that means the destination is not appropriately defined for the current database "Primary", because the current database is "PRIMARY_ROLE" but in log archive destinations it has configured with the "STANDBY_ROLE".
Example 2: The next image belongs to the Far Sync(12c) instance, here the VALID_NOW output is "WRONG VALID_TYPE", If we see at the VALID_TYPE of output it is "ONLINE_LOGFILE" but the remote destination valid only when archiving standby redo log files. So these things need to be considered when configure destination settings based on the current database_role/valid_role and the destination logfile type.
If we do not mention any value for VALID_FOR in destination settings then Oracle considered with default value i.e. (ALL_LOGFILES, ALL_ROLES) for any type of database role whether primary, standby or far sync. LOG_ARCHIVE_DEST_2 is the main source parameter where values will be changed based on the protection mode settings, database features(compression) and various options(delay_mins) but remaining Data Guard parameters are almost static until unless if any maintenance or change in db_unique_name or change of oracle net service.
Validating Real-Time apply is very much necessary, I will describe example why with legacy non-Real-Time apply. let's suppose the online redo log file size is 5gb, prior to 10g after the complete log switch the sequence will be shipped to standby database(s) so in that case Arch process is involved as redo transport service. So during the 5gb log(current) if some disaster over primary database happens then there is chance of huge data loss when you think in terms of transactions per second. Oracle introduced from 10g as Real-Time apply so that LGWR works as redo transport service so that simultaneously it writes in primary online redo log files and also standby redo logs of standby with the coordination of LNS and RFS processes. Hence there is no chance of loss with committed transactions in disaster case of either on primary or standby. When it comes to 11g version Real-time apply will be by default until unless we explicitly mention as ARCH process in log_archive_dest_n, Apart from that we need to configure few settings. So we need to identify whether Real-Time apply is working as expected instead of assumption.
In order to enable Real-Time apply the key role is to enforce LGWR process in remote destinations and also we must have configured the standby redo log files on standby database. For example the online redo log file size is 50mb with 3 groups, then we must have same size(50mb) of standby redo log groups and also same number of groups(3) with Maximum performance mode. When it comes to Maximum protection/Availability we must have an additional standby redo log file group. It is also highly recommended to have standby redo log files on primary so that, it will be useful in case of role transition/Switchover.
From the below image, the online redo log file of each group size is 50mb. "con_id" is an extra column which is introduced from multitenant architecture, the value "0" in case of entire CDB and also used for the rows in non-CDB's. This is applicable for even online redo logs and standby redo log groups.
We can get information of standby redo log groups from below two views, so we have 4 standby redo log groups and each with size 50mb.
This is the view which refers to the standy redo logs status and the key column is Status, if the group 4 status is Active and the Archive is Yes that means standby redo log is curerntly writing and not ready to be archived. If the Status is Active and the Archive is NO then the standby redo log is complete and waiting to be archived. When it comes to the Status of UNASSIGNED and the Status is NO that mean standby redo log was already archived and available to use. If the status columns is UNASSIGNED and the Archived is YES then the standby redo log group was never used and of course it is available to use.
When it comes to the configuration of redo logs and standby redo logs its very much meeting the requirements for Real-Time apply, after that we can change the destination redo transport attributes to "LGWR" if it was "ARCH" and "LGWR" can work either with synchronously or asynchronously depending on the protection mode requirements.
Example: log_archive_dest_2='service=india lgwr async valid_for=(online_logfiles,primary_role) db_unique_name=india'
If the RDBMS version is prior to 11g, then we must use "alter database recover managed standby database using current logfile disconnect from session". So we are done with reviewing the Real-Time apply configuration and considerations, now we need to verify whether real-time apply is working as expected or not. We can review several views to justify, they are..The view v$managed_standby provides much information, if we see the sequence number 220 with the process MRP0 its applying and we can see how many blocks are received, as soon as any transactions are transmitted from primary to standby then the values are updated.
This view remains as a good example to judge whether the physical standby database is in real-time apply or not based on how the blocks are changing.
Verifying Redo: View "v$dataguard_stats" helps in estimating when the redo received from primary database and redo which is not yet available on standby database. Transport LAG refers to the amount of transported redo and the generation of the redo on primary database. Apply LAG refers to the data on the standby behind to the primary database. Apply LAG It plays critical role in case of disaster to estimate the data loss. Apply Finish time will estimate how much time required in order to recover all the transactions received.
When it comes to Logical standby database, to track whether the transactions are actively processed or not, we can monitor using the query v$logstdby_transaction.
Validating configuration of Data Guard is something default to check, same time it's very much necessary to validate the Active Data Guard if its licensed. Basic requirements for Active Data Guard is having additional license, ensure standby is in real-time apply mode, open database in read-only in short Real-Time Query mode.
To crosscheck the Real Time query mode, we need to ensure that standby database is in Read Only with Apply mode.
SQL> select open_mode from v$database;
READ ONLY WITH APPLY
SQL> select * from v$standby_event_histogram where name='apply lag' and count > 0;
NAME TIME UNIT COUNT LAST_TIME_UPDATED CON_ID
---------- ---------- ---------------- ---------- -------------------- ----------
apply lag 0 seconds 104 05/10/2015 12:44:32 0
If there is any real LAG on standby with primary, we can see various values of apply lag and the count in terms of buckets with the number of occurrences.
We've seen how to monitor the configuration of Data Guard, how to identify the wrong settings of the configuration which can impact the standby database and explanation of how to monitor and validate the configuration of Real-Time apply in order to function as expected and avoid data loss in case of huge log files. Finally we have also covered how to evaluate whether the Real-Time query mode is enabled and how to monitor the apply lag on standby with Primary for avoiding mislead to the user with the old data.