Written By Nassyam Basha
To achieve zero data LAG to the applications with the Active Data Guard. Apart from disaster recovery purpose there are many things we can capture the features with Active Data Guard by running the applications on standby database but we need to configure few more additional things to achieve zero data lag on standby under any circumstances. This article explains all.
We know well that starting from 11g, Standby database can be opened read only mode and same time recovery can be started together with Active Data Guard feature/license which is providing a large scope to the applications to use for any reporting jobs which uses only Select queries. From here many customers preferring to use Active Data Guard so that we can offload the Reporting jobs, backups and much more to the standby database which can free up the resources on primary/production database.
In the earlier days of Active Data Guard, most of them are in assumption that Physical standby is in Sync with primary database and applications are using up to date data. It is true if the configuration is proper and the things are going fine but it's a blunder if something gone wrong on standby database. So let's suppose a Banking application running for the quarterly report which is very critical and same time due to some issue standby was behind the primary for 30 minutes or 40 minutes but still reports executed successfully but without fresh data which misleads our assumptions. Usually such reports were configured on primary for the guarantee of data but they completely rely on Active Data Guard as it meets the jobs requirement. But this is true if you have configured Data Guard/Active Data Guard in such a way to achieve Zero Data Loss. Let me provide small example on data lag and how to identify the Lag in simple way.
The configuration looks good and no issues found with it.
DGMGRL> show configuration;
Configuration - ckpt12c
Protection Mode: MaxPerformance
canada - Primary database
india - Physical standby database
Fast-Start Failover: DISABLED
From Primary database when I ran query on primary database, it returned 5 rows.
SQL> select db_unique_name,database_role from v$database;
SQL> select * from adg_tab;
QNAME QID QTYPE
---------- ---------- ----------
AAAA 1001 CURRENT
BBBB 1002 SAVINGS
CCCC 1003 CREDIT
CCCC 1004 SAVINGS
DDDD 1005 SAVINGS
But from standby database the same table returned only 3 rows. So there is something wrong but you still able to fetch database which is historical and no matter whether its 1 minute past or 30 minutes. Still even one transaction is also a big data loss if it's not present on Standby database.
INDIA PHYSICAL STANDBY
So maybe the past reporting jobs are lying that standby having the updated information same as primary database? In order to justify we can use simple queries from Primary and Standby database. Of course primary SCN will be bit ahead to standby but we have to ensure that standby SCN is advancing.
SQL> select current_scn from v$database;
Apart from that we can use various queries to estimate LAG between Primary and Standby database. There are few very basic things we must consider for Zero Data LAG they are Real-Time apply and Database protection mode. Of course Oracle recommends strongly to use Real-Time apply in any protection modes.
To achieve "zero data loss", the protection mode of database is a crucial role, With the Maximum Protection mode by default behavior is Zero Data loss, but when it comes to Maximum Availability or Maximum Performance we need to take care of configuring parameters and settings. Regardless of any protection mode , the standby should be in real time query mode and of course which can achieved only through Real-Time apply. In order to configure Zero Data Loss ensure that there is no DELAY_MINS configured on primary which points to the remote destination. DELAY_MINS basically the data in standby will be received from primary with delay, The DELAY_MINS parameter is of course good feature/parameter so that any changes done on primary will not immediately applied on standby, hence if any truncated tables, dropped schemas from primary but still we can retrieve them from standby because the changes were not applied because of the parameter DELAY_MINS. So before configuring zero data lag we must check the primary database remote destinations does not have configured. You can also check through views whether configured or not.
SQL> select dest_id,delay_mins from v$archive_dest where dest_id in (2,3,4);
This parameter is an session level parameter , by setting this parameter to a value for example 10 seconds, then it means whenever the application starts querying the data and it can tolerate maximum of 10 seconds. We can also configure parameter to "0" hence there will be no chance for tolerance and finally session will be ended without error but it won't results the stale data.
STANDBY_MAX_DATA_DELAY parameter default value is "NONE" that means the parameter will not be applicable and if the use query the data it will not check whether the data is stale or not.
SQL> show parameter STANDBY_MAX_DATA_DELAY
The parameter will not be visible at database level, because its designed for session level to configure. Still if you try to set this parameter and it ends up with warning as this parameter is applicable only for non-SYS users.
SQL> alter session set standby_max_data_delay=30;
ORA-03174: STANDBY_MAX_DATA_DELAY does not apply to SYS users
We will see demonstration how this parameter actually works when actual user queries the data. Before proceeding with that we will check broker status of standby database and it performing Real-Time query and also there is no apply lag as well.
DGMGRL> show database india
Database - india
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 1 second ago)
Apply Rate: 0 Byte/s
Real Time Query: ON
Now Connect to the normal user and set the parameter "standby_max_data_delay" to 30 seconds. As soon as we set this parameter the data lag tolerance is acceptable only for 30 seconds, If staleness of data exceeded 30 seconds then it will results us error but never produce the stale/old data which can mislead the things.
SQL> conn c##nassyam
SQL> show user
USER is "C##NASSYAM"
SQL> select * from std_lag;
After setting the parameter standby_max_data_delay to 30 seconds then to interpret the error we have forced to stop the MRP and then inserted a new row from primary database. Then the new rows are visible in primary.
DGMGRL> edit database india set state='apply-off';
SQL> insert into std_lag values (104,'DDDD');
1 row created.
104 DDDD <-- New Row
If you see above the new row inserted is visible on primary but when we run the select query across the same table then it results an error but not given old data without new row. So this example promises how the zero data lag will be ensured with the parameter.
SQL> show parameter standby_max_data_delay
select * from std_lag
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-03172: STANDBY_MAX_DATA_DELAY of 30 seconds exceeded
We can crosscheck how much lag is standby with the primary database using Broker.
Intended State: APPLY-OFF
Apply Lag: 10 minutes 17 seconds (computed 0 seconds ago)
Apply Rate: (unknown)
Real Time Query: OFF
It's not necessary that we should check from Broker but the results will be viewable through below view.
SQL> select name,value from v$dataguard_stats;
transport lag +00 00:00:00
apply lag +00 00:10:25
apply finish time +00 00:00:00.052
estimated startup time 18
Finally, it's not mandatory to use when connecting to SQL prompt but we can also configure triggers so that whenever applicable tries to access application it will verify the standby delay with primary.
1 CREATE OR REPLACE TRIGGER STD_LAG
2 AFTER LOGON ON database
4 IF (SYS_CONTEXT('USERENV','DATABASE_ROLE') in ('PHYSICAL STANDBY'))
6 execute immediate 'ALTER SESSION SET STANDBY_MAX_DATA_DELAY=20';
7 END IF;
With the above trigger the staleness of data will be allowed only until 20 seconds and after that the user cannot query the data and it will be ended up with the errors. Again we have stopped MRP and inserted one more row in the table.
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 25 seconds (computed 1 second ago)
So we have now lag of 25 seconds which should raise exception with trigger.
When the lag is reached more than 20 seconds and the session won't even allow you to connect the session. Of course the trigger settings depend on you how to configure.
SQL> conn c##nassyam/oracle
ORA-00604: error occurred at recursive SQL level 2
ORA-03172: STANDBY_MAX_DATA_DELAY of 20 seconds exceeded
Warning: You are no longer connected to ORACLE.
Of course the information we can also track from the alert log as below
Sat Jun 20 06:24:23 2015
opidcl aborting process unknown ospid (13546) as a result of ORA-3172
Sat Jun 20 06:24:29 2015
opidcl aborting process unknown ospid (13554) as a result of ORA-3172
Sat Jun 20 06:24:35 2015
opidcl aborting process unknown ospid (13558) as a result of ORA-3172
Sat Jun 20 06:24:37 2015
opidcl aborting process unknown ospid (13566) as a result of ORA-3172
Sat Jun 20 06:24:44 2015
opidcl aborting process unknown ospid (13569) as a result of ORA-3172
-bash-3.2$ oerr ora 3172
03172, 00000, "STANDBY_MAX_DATA_DELAY of %s seconds exceeded"
// *Cause: Standby recovery fell behind the STANDBY_MAX_DATA_DELAY
// *Action: Tune recovery and retry the query later, or switch to another
// standby database within the data delay requirement.
By thus feature of standby_max_data_delay it will ensure there is no stale data.
Related to data lag issues over Active Data Guard, Oracle introduced one more command "alter session sync with primary" so that after connection to the user first it tries to sync the standby with primary and the it leaves to you for further use. This parameter indirectly checks and confirms there is no data lag if any issues with MRP of standby and unable to synchronizes the standby with primary then session will be available to query the tables without waiting for the redo apply. We will see how it works.
DGMGRL> show database india;
Apply Lag: 9 seconds (computed 0 seconds ago)
From the above command of DGMGRL we ensured that MRP is not running. Now we will connect to the standby using application user and we can check whether the standby is in sync with primary using the command. This command can be used by any user and there are no restrictions with it.
SQL> alter session sync with primary;
ORA-03173: Standby may not be synced with primary
SQL> !oerr ora 3173
03173, 00000, "Standby may not be synced with primary"
// *Cause: ALTER SESSION SYNC WITH PRIMARY did not work because either the
// standby was not synchronous or in a data-loss mode with regard
// to the primary, or the standby recovery was terminated.
// *Action: Make the standby synchronous and no-data-loss with regard
// to the primary. Make the standby recovery active.
Finally, the command can be enforced from the trigger so that whenever user login, before querying data first it will check the standby synchronization with the primary and then it proceeds to access tables/data.
1 CREATE OR REPLACE TRIGGER STD_SYNC
2 AFTER LOGON ON DATABASE
6 execute immediate 'ALTER SESSION SYNC WITH PRIMARY';
If in case there is any LAG on standby with primary database, then first it interprets or enforce the command to make standby synchronize before leaving to applications.
Apply Lag: 17 seconds (computed 0 seconds ago)
After having GAP on standby, when we connect to standby user then of course it connects but it will wait until the trigger is completely executed.
-bash-3.2$ sqlplus c##nassyam/oracle
SQL*Plus: Release 184.108.40.206.0 Production on Sat Jun 20 06:50:34 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
-- waited ---
Last Successful login time: Sat Jun 20 2015 06:47:42 +05:30
Oracle Database 12c Enterprise Edition Release 220.127.116.11.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Of course there will be no errors will appear at session level but you can track them from alert log that standby may not be synced.
Sat Jun 20 06:50:39 2015
Errors in file /u01/app/oracle/diag/rdbms/india/drmcdb/trace/drmcdb_ora_14743.trc:
ORA-06512: at line 4
Sat Jun 20 06:50:44 2015
*** 2015-06-20 06:50:39.340
*** SESSION ID:(39.87) 2015-06-20 06:50:39.340
*** CLIENT ID:() 2015-06-20 06:50:39.340
*** SERVICE NAME:(SYS$USERS) 2015-06-20 06:50:39.340
*** MODULE NAME:(sqlplus@CKPT-ORA-04 (TNS V1-V3)) 2015-06-20 06:50:39.340
*** ACTION NAME:() 2015-06-20 06:50:39.340
*** CONTAINER ID:(1) 2015-06-20 06:50:39.340
Skipped error 604 during the execution of SYS.STD_SYNC
*** 2015-06-20 06:50:39.341
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)
----- Error Stack Dump -----
*** 2015-06-20 06:50:44.374
Skipped error 604 during the execution of C##NASSYAM.STD_SYNC
The standby lag reason whatever it may be but to avoid the misconception or staleness of data to the applications, we can use above the techniques mentioned. Hence there is large scope to provide accuracy of DATA instead of assumption that our standby is in complete SYNC.
Good Article Basha!
Thank You for the feedback Deiby Gomez