well, oracle dataguard in general is rock solid (if configured well – but you will find a lot of guides in the internet how to do so)… but as always, sometimes something is not working as it should and the standby is out of sync afterwards (i.e. a network outage between the servers, and to avoid archiver stuck on the primary site you are forcing a delete of the archive logs).
OK, the network is working again, so how to solve the gap? If only a few archivelogs are missing, my first choice would be a restore of the archivelogs from backup. But what if thousends of archivelogs are missing? restore them all from tape? Based on my experiences, this could be a very time consuming procedure, in worst case the tape drive is not fast enough to catch up with the primary. In this case you may consider rebuilding the standby (rman> duplicate database for standby from active database;) … have you ever tried to do this with a 50 TB database? The solution might be an incremental recovery! If you consider some peculiarities, it’s not so complicated…
- stop redo apply (it’s not working anyway)
you cannot apply an incremental backup while the apply is running, so let’s stop it:
dgmgrl / edit database 'ORCLRB' set state='APPLY-OFF';
- evaluate what’s missing
Then we have to figure out where our incremental backup shoud start… the easy way is to query the v$database view to get the current scn – but this only returns the information stored in the controlfile, sometimes the headers of the datafiles do have a lower scn and rman refuses to start the recovery. so the sql must be like this:
# execute on standby! sqlplus / as sysdba set linesize 200 SELECT 'BACKUP INCREMENTAL FROM SCN '||min(SCN)||' DATABASE FORMAT ''/tmp/recover_standby_%U'';' FROM ( select current_scn SCN from V$DATABASE union all select min(checkpoint_change#) from v$datafile_header where file# not in (select file# from v$datafile where enabled = 'READ ONLY') );
- start backup
well, just execute the output of the above sql on the primary database… and when it is finished, you also need a fresh standby controlfile:
# execute on primary! sqlplus / as sysdba ALTER DATABASE CREATE STANDBY CONTROLFILE AS '/tmp/stby.ctl';
- start recover
First you have to make the backup availeble on the standby site… either mount as NFS device or copy the backup files via scp to the standby site. Then catalog the files and start recovery…
# execute on standby! rman target / catalog start with '/tmp/recover_standby'; recover database noredo;
- restore controlfile
now we have to “update” the controlfile… since this is not possible, we are doing a full restore.
# execute on standby! rman target / shutdown immediate; startup nomount; restore controlfile from '/tmp/stby.ctl'; alter database mount; exit;
if the structure of the database is not exactly the same as on the primary site, you have to populate the the new file and folder names:
rman target / catalog start with '+U01/ORCLRB/DATAFILE'; switch database to copy;
- enable redo apply
almost finished, now we only have to enable the dataguard process again…
dgmgrl / edit database 'ORCLRB' set state='APPLY-ON';
done… just check the alert.log if the standby is receiving and applying the last few archivelogs.