Oracle Dataguard in 5 Minutes

ok, although I wrote that there are plenty of websites and blogs describing how to create a dataguard standby, I’m doing one on my own… specially because my colleague asked me to write them a manual, so I decided to do it here.

In my example I’m using two Servers with Oracle 12.2 clusterware (Oracle restart) including an ASM instance and a 12.2 db home. On the first server – lets call it the primary node – I’ve also installed a non-CDB Database with the database create assistant (dbca). The name of the database is ORCL, the db_unique_name is ORCLRA (the db_unique_name is quite important for dataguard to distinguish the different sites)

  • Prepare the source

there are some mandatory things that must be checked or adjusted before we can start to create the standby… the database must be in archivelog mode (dataguard is a log shipping technology, so this is obvious) and we must enable force logging, so we can transport also the changes executed with “nologging”

SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database force logging;
SQL> alter database open;
  • add standby redo logs

for those who are asking what the hell standby redo logs are – here a short explanation: the primary database is transferring the redo information to the standby via sql.net… this means there must be an oracle instance receiving this information and store it somewhere. And because oracle wants to separate standby and redo information, there is a new log type called standby log. The easiest way is to create it on the primary site – when the primary becomes a standby, the standby logs are already there… and if the logs are present before duplicating the database, they will be created automatically during the copy πŸ˜‰ So we are creating one more standby logs than we have redo logs – this is because we have one extra source that could copy data from the primary: there can be a stream from each redo log group and one directly from the log buffer. For further information you may want to watch this site…

SQL> select count(*) from v$log;
  COUNT(*)
----------
         3

SQL> select distinct bytes from v$log;
     BYTES
----------
 209715200

SQL> alter database add standby logfile group 11 size 209715200;
SQL> alter database add standby logfile group 12 size 209715200;
SQL> alter database add standby logfile group 13 size 209715200;
SQL> alter database add standby logfile group 14 size 209715200;
  • enable standby_file_management

often forgotten, but in my opinion an important parameter. The default of this parameter is “manual”, which means when adding a datafile on the primary site, you have to do the same on the standby site. If you forget to do so, the replication will stop… and we are humans and humans do forget things sometimes. So just set the parameter to “auto” and oracle will take care of it.

SQL> alter system set STANDBY_FILE_MANAGEMENT=AUTO;
  • modify listener.ora

OK, now we need to modify the listener.ora file – oracle dataguard need access to the database via sql.net under any circumstances, even when the database is down. To solve this problem, we have to add a static entry to the listener.ora…

vi $GRID_HOME/network/admin/listener.ora
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = primary)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )

ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON              # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_LISTENER=SUBNET        # line added by Agent

# add static listener entry starts here:
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = ORCL)
      (GLOBAL_DBNAME=ORCLRA_DGMGRL)
      (ORACLE_HOME = /u00/app/oracle/product/12.2.0/db_1)
    )
  )

the same must be done on the standby site, too – only change the “GLOBAL_DBNAME” to a proper value – the “_DGMGRL” extension is mandatory for oracle dataguard… regarding the db_unique_name: I usually use the db_name with the suffix RA (for “recovery site A”), RB, RC, etc… and finaly restart or reload the listener to enable the new configuration.

  • init and password file

now we are doing the last few steps of preparation… to start the standby database, we need an pfile… so we a just creating one from the primary:

SQL> create pfile='/tmp/initORCL.ora' from spfile;

to be able to logon as sys via sql.net, we need the password file on the standby site… so we just copy the file together with the pfile to the other server:

BASH> scp /tmp/init${ORACLE_SID}.ora standby:/${ORACLE_HOME}/dbs/
BASH> scp ${ORACLE_HOME}/dbs/orapw${ORACLE_SID} standby:/${ORACLE_HOME}/dbs/

And now just edit the pfile and replace all values containing the db_unique_name with the now value.

# do on standby site!
BASH> sed -i 's/ORCLRA/ORCLRB/g' ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora

to startup the instance, you have to create the audit file destination manually

BASH> grep audit_file_dest ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora
BASH> mkdir -p /u00/app/oracle/admin/ORCLRB/adump
  • create spfile

Oracle dataguard requires an spfile, so we have to create it…

BASH> echo "ORCL:/u00/app/oracle/product/12.2.0/db_1:N" >> /etc/oratab
BASH> . oraenv <<< ORCL
SQL> startup nomount;
SQL> create spfile='+DATA/ORCLRB/spfileORCL.ora' from pfile;
SQL> shutdown;
BASH> mv ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora.bak
BASH> echo "spfile='+DATA/ORCLRB/spfileORCL.ora'&amp;amp;quot; ${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora
SQL> startup nomount;
SQL> show parameter spfile
NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------------------------------------------
spfile                               string      +DATA/ORCLRB/spfileorcl.ora
  • final checks before copy

ok, now we have got a primary database in archivelog mode with force logging enabled and a standby instance in nomount state with the same password file, using an spfile. Furthermore, both instances are accessable via sql.net… so lets check if it’s working!
the easiest way is just to logon to the database via sqlplus:

sqlplus sys/oracle@primary:1521/orclra_dgmgrl as sysdba
sqlplus sys/oracle@standby:1521/orclrb_dgmgrl as sysdba

if both are working, we can continue, otherwise check the password file and listener entries again. Also checking the sqlnet.ora if ezconnect is allowed could be a good idea.

  • duplicate

the next step is the easiest one, although it might take a while, depending on the size of your database… we do an rman duplicate:

rman target sys/oracle@primary:1521/orclra_dgmgrl auxiliary sys/oracle@standby:1521/orclrb_dgmgrl
RMAN> duplicate target database for standby from active database;

when done you should also add the database to the clusterware, so the database will be started automatically after a reboot etc

BASH> srvctl add database -db ORCLRB -oraclehome /u00/app/oracle/product/12.2.0/db_1 -role physical_standby -instance ORCL -startoption MOUNT
  • enable log shipping

ok, and now the last step: configure the dataguard broker. First we have to enable the broker on both sites…

SQL> alter system set dg_broker_start=true;

and now (maybe wait a few seconds until the broker process is really up) we have to configure it… if you once understand the syntax, it’s really a no brainer.

BASH> dgmgrl /
DGMGRL> create configuration 'DG' as primary database is 'ORCLRA' connect identifier is '192.168.56.21:1521/ORCLRA_DGMGRL';
DGMGRL> add database 'ORCLRB' as connect identifier is '192.168.56.22:1521/ORCLRB_DGMGRL';
DGMGRL> enable configuration;

thats it… you can check your success via the broker (check the lines “Transport LAG” and “Apply Lag”) or by checking the alert.log of the standby database (just check if the media recovery process is waiting for an archivelog in transit)… suspicious people may do both πŸ˜‰

DGMGRL> show database 'ORCLRB'
Database - ORCLRB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 4.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    ORCL

BASH> tail /u00/app/oracle/diag/rdbms/orclrb/ORCL/trace/alert_ORCL.log
Media Recovery Waiting for thread 1 sequence 15 (in transit)
2018-09-06T12:59:27.551097+02:00
Recovery of Online Redo Log: Thread 1 Group 12 Seq 15 Reading mem 0
  Mem# 0: +DATA/ORCLRB/ONLINELOG/group_12.274.986057281
  Mem# 1: +DATA/ORCLRB/ONLINELOG/group_12.275.986057283

just tell me if I have missed something…

so far
Benjamin

Posted in Uncategorized | Tagged , | Leave a comment

fast provisioning with snapshot clones

Every DBA know’s the situations, where your boss / customer ask you to copy fresh data into the test database… since this blog is about oracle databases, we are solving this issue usually either with “datapump” or “rman duplicate” – and there is nothing wrong about it. But now imagine, we are talking about more than 400 test environments. To satisfy all testers, the production is exported every day (although not all test databases are refreshed daily), which takes several hours every day. And then there are a lot of costs for the expensive SAN storage… So i was asked if there is a better, faster and cheaper solution. As you may suppose, after some research I found the 12c new feature of doing snapshot clones… and it’s really nice, so I want to show you how it works.

Advantages

  • really fast provisioning (a few seconds)
  • very efficient storage usage

Disadvantages

  • it’s based on pluggable database, so you need the multitenant option license
  • only 3 storage types/vendors are certified until now: ZFS, ACFS and NetApp

since i do not have a netapp or zfs applicance, I’m using ACFS for my test, but there are also some things to consider:

  • it’s slower than ASM
  • you have to install a RAC – oracle restart is not sufficient… nevertheless it’s OK to run a 1 node “cluster” where you don’t want to buy the RAC option.

OK, lets start… first – as mentioned above – we need a rac. If you don’t know how to install, i recommend to visit this site: rack attack.
Next we are going to create an ACFS volume… this can be done easily with asmcmd:

bash> asmcmd volcreate -G <diskgroup> -s 10G <volume_name>

of course you have to replace the variables in the brackets… in my test environment I created a dedicated diskgroup called ACFS and created a volume named oradata.
then we have to check, which device name was reserved for this new volume

bash> asmcmd volinfo -G ACFS oradata
Diskgroup Name: ACFS

         Volume Name: ORADATA
         Volume Device: /dev/asm/oradata-30
         State: ENABLED
     ... 

Great! now we have to crate a filesystem on that virtual block device

mkfs.acfs /dev/asm/oradata-30

and last step: mount the volume

sudo /sbin/acfsutil registry -a /dev/asm/oradata-30 /u00/app/oracle/oradata

that’s it – now the clusterware should detect the new entry in the OCR (the check is running every 30 seconds) and automount the volume… if not, you can do it manually:

sudo mkdir /u00/app/oracle/oradata
sudo /bin/mount -t acfs /dev/asm/oradata-30 /u00/app/oracle/oradata

next chapter: create the database. Since i want to simulate the complete way, I created two databases: one is the source (=production) called ORCL, the other the target (=test) database named CDB. Both databases are located on ASM disksgroups (not the ACFS volume!). The source is installed the classical way (no multi- or singletenant) while the target is installed as CDB but without any PDB.
To use the PDB-clone-feature, we have to create a database link from the target to the source first:

SYS@CDB1>CREATE DATABASE LINK orcl CONNECT TO system IDENTIFIED BY <password> USING 'rac01-scan:1521/orcl';

I just used the user “system”, but any other user with the “create session” and “create pluggable database” privilege is sufficient.
next we have to prepare the directory and create the master pdb

bash> mkdir -p /u00/app/oracle/oradata/CDB/prod_seed
sql>  CREATE PLUGGABLE DATABASE prod_seed FROM NON$CDB@orcl CREATE_FILE_DEST='/u00/app/oracle/oradata/CDB/prod_seed';

based on the size of the database and the speed of your storage, this will take some minutes…
when finished, you have a seed pdb with the interal structure of a non-cdb… so you have to convert it first.

SQL> ALTER SESSION SET CONTAINER=prod_seed;
SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
SQL> ALTER PLUGGABLE DATABASE prod_seed OPEN;
SQL> ALTER PLUGGABLE DATABASE prod_seed OPEN READ ONLY FORCE;
SQL> ALTER PLUGGABLE DATABASE prod_seed SAVE STATE;

please note that you have to open the database first to make some kind of instance recovery, afterwards you can bring the pdb in the correct state to be a seed database.
And now: create the clone

bash> mkdir -p /u00/app/oracle/oradata/CDB/clone1
SQL> CREATE PLUGGABLE DATABASE clone1 FROM prod_seed CREATE_FILE_DEST='/u00/app/oracle/oradata/CDB/clone1' SNAPSHOT COPY;
SQL> ALTER PLUGGABLE DATABASE clone1 OPEN;

P.S. in some guidelines you will be advised to set the parameter clonedb to true – don’t do, at least if you are using ACFS! it’s only necessary if your storage can’t do snapshots, so oracle is trying to do some magic, don’t know if its working. For further details, just read the manual (search for “SNAPSHOT COPY”)
OK, and now? Finished! … and where is the magic? the last ‘create pluggable database…’ just took a few seconds, regardless the size of the seed database, and it consumes no space at all (at least at the moment)…

bash> du -sh *
228K    clone1
3.6G    prod_seed

and what’s inside?

ls -lhr clone1/CDB/72C5EDFE41727F36E0531F01100AB18B/datafile/
total 20K
lrwxrwxrwx. 1 oracle dba 154 Aug  6 16:17 o1_mf_users_fpjp2wdt_.dbf -> /u00/app/oracle/oradata/.ACFS/snaps/72C5EDFE41727F36E0531F01100AB18B/CDB/prod_seed/CDB/72C340E0CEE355E1E0531F01100AFEB9/datafile/o1_mf_users_fpjbvo7t_.dbf
lrwxrwxrwx. 1 oracle dba 157 Aug  6 16:17 o1_mf_undotbs2_fpjp2wdg_.dbf -> /u00/app/oracle/oradata/.ACFS/snaps/72C5EDFE41727F36E0531F01100AB18B/CDB/prod_seed/CDB/72C340E0CEE355E1E0531F01100AFEB9/datafile/o1_mf_undotbs2_fpjbvo7g_.dbf
lrwxrwxrwx. 1 oracle dba 157 Aug  6 16:17 o1_mf_undotbs1_fpjp2wco_.dbf -> /u00/app/oracle/oradata/.ACFS/snaps/72C5EDFE41727F36E0531F01100AB18B/CDB/prod_seed/CDB/72C340E0CEE355E1E0531F01100AFEB9/datafile/o1_mf_undotbs1_fpjbvo4c_.dbf
lrwxrwxrwx. 1 oracle dba 155 Aug  6 16:17 o1_mf_system_fpjp2w7z_.dbf -> /u00/app/oracle/oradata/.ACFS/snaps/72C5EDFE41727F36E0531F01100AB18B/CDB/prod_seed/CDB/72C340E0CEE355E1E0531F01100AFEB9/datafile/o1_mf_system_fpjbvnyh_.dbf
lrwxrwxrwx. 1 oracle dba 155 Aug  6 16:17 o1_mf_sysaux_fpjp2wc4_.dbf -> /u00/app/oracle/oradata/.ACFS/snaps/72C5EDFE41727F36E0531F01100AB18B/CDB/prod_seed/CDB/72C340E0CEE355E1E0531F01100AFEB9/datafile/o1_mf_sysaux_fpjbvo3z_.dbf

so oracle “tells” the storage (in this case the ACFS driver) to do a snapshot of these data files (and only these files, if there are other databases on this acfs volume they won’t be inside the snapshot), afterwards these snapshots are linked to the location where the data files of the new clone database should be – thats it… very easy and the dba has not to worry about storage snapshots, it’s done automatically. If you drop the clone the snapshot will also be deleted automatically.

just tell me if you have any questions…

Benjamin

Posted in Uncategorized | Tagged , , , , | Leave a comment

Performance drilldown

well, every dba might now these kind of phone calls, where a user is complaining about poor performance of the database… but if you ask questions like when or what part of the application is slow, you don’t get useful answers.
so you have to play “detective” and find the guy who murdered the performance πŸ˜‰
But where to start? You might generated some AWR reports and try to find something nasty – but in my opinion this is like finding a needle in a haystack, you need something covering a metric over time… my advise: use DBA_HIST_SYSMETRIC_SUMMARY (as long as you have the proper license!)

if you are in a hurry and need something handy now, you may use this:

set linesize 200
col "CPU USAGE %" for a12 heading "CPU|USAGE %"
col "LOGICAL READ IO/s" for a12 heading "LOGICAL|READ IO/s"
col "PHYSICAL READ IO/s" for a12 heading "PHYSICAL|READ IO/s"
col "PHYSICAL READ MB/s" for a12 heading "PHYSICAL|READ MB/s"
col "PHYSICAL WRITE IO/s" for a12 heading "PHYSICAL|WRITE IO/s"
col "PHYSICAL WRITE MB/s" for a12 heading "PHYSICAL|WRITE MB/s"
col "REDO MB/s" for a12 heading "REDO|MB/s"
col "BUFFER HIT RATIO" for a12 heading "BUFFER|HIT RATIO"
select 
    snap_id
   ,to_char(begin_interval_time, 'YYYY-MM-DD HH24:MI') "DATE"
   ,round(cpu_value,2) "CPU USAGE %"
   ,round(logical_read_value) "LOGICAL READ IO/s"
   ,round(physical_read_value) "PHYSICAL READ IO/s"
   ,round(physical_read_byte_value/1024/1024) "PHYSICAL READ MB/s"
   ,round(physical_write_value) "PHYSICAL WRITE IO/s"
   ,round(physical_write_byte_value/1024/1024) "PHYSICAL WRITE MB/s"
   ,round(redo_byte_value/1024/1024,2) "REDO MB/s"
   ,round(100-(physical_read_value/logical_read_value*100),2) "BUFFER HIT RATIO"
from (
   select 
       a.snap_id
      ,b.begin_interval_time
      ,a.metric_name
      ,a.average 
   from dba_hist_sysmetric_summary a
   inner join dba_hist_snapshot b on a.snap_id = b.snap_id
   )
pivot (sum(average) as value for (metric_name) in (
          'Logical Reads Per Sec' as logical_read
         ,'Physical Reads Per Sec' as physical_read
         ,'Physical Writes Per Sec' as physical_write
         ,'Physical Read Total Bytes Per Sec' as physical_read_byte
         ,'Physical Write Total Bytes Per Sec' as physical_write_byte
         ,'Redo Generated Per Sec' as redo_byte
         ,'Host CPU Utilization (%)' as cpu))
order by snap_id;

Keep in Mind: this SQL gives you the average values between two snapshots, the dba_hist_sysmetric_summary view is also able to provide you the peak values…

so far
Benjamin

Posted in Uncategorized | Tagged , | Leave a comment

recover standby database

well, oracle dataguard in general is rock solid (if configured well – but you will find a lot of guides in the internet how to do so)… but as always, sometimes something is not working as it should and the standby is out of sync afterwards (i.e. a network outage between the servers, and to avoid archiver stuck on the primary site you are forcing a delete of the archive logs).

OK, the network is working again, so how to solve the gap? If only a few archivelogs are missing, my first choice would be a restore of the archivelogs from backup. But what if thousends of archivelogs are missing? restore them all from tape? Based on my experiences, this could be a very time consuming procedure, in worst case the tape drive is not fast enough to catch up with the primary. In this case you may consider rebuilding the standby (rman> duplicate database for standby from active database;) … have you ever tried to do this with a 50 TB database? The solution might be an incremental recovery! If you consider some peculiarities, it’s not so complicated…

  1. stop redo apply (it’s not working anyway)

you cannot apply an incremental backup while the apply is running, so let’s stop it:

dgmgrl /
edit database 'ORCLRB' set state='APPLY-OFF';
  1. evaluate what’s missing

Then we have to figure out where our incremental backup shoud start… the easy way is to query the v$database view to get the current scn – but this only returns the information stored in the controlfile, sometimes the headers of the datafiles do have a lower scn and rman refuses to start the recovery. so the sql must be like this:

# execute on standby!
sqlplus / as sysdba
set linesize 200
SELECT 'BACKUP INCREMENTAL FROM SCN '||min(SCN)||' DATABASE FORMAT ''/tmp/recover_standby_%U'';' FROM (
   select current_scn SCN from V$DATABASE
   union all
   select min(checkpoint_change#) from v$datafile_header 
   where file# not in (select file# from v$datafile where enabled = 'READ ONLY')
);
  1. start backup

well, just execute the output of the above sql on the primary database… and when it is finished, you also need a fresh standby controlfile:

# execute on primary!
sqlplus / as sysdba
ALTER DATABASE CREATE STANDBY CONTROLFILE AS '/tmp/stby.ctl';
  1. start recover

First you have to make the backup availeble on the standby site… either mount as NFS device or copy the backup files via scp to the standby site. Then catalog the files and start recovery…

# execute on standby!
rman target /
catalog start with '/tmp/recover_standby';
recover database noredo;
  1. restore controlfile

now we have to “update” the controlfile… since this is not possible, we are doing a full restore.

# execute on standby!
rman target /
shutdown immediate;
startup nomount;
restore controlfile from '/tmp/stby.ctl';
alter database mount;
exit;

if the structure of the database is not exactly the same as on the primary site, you have to populate the the new file and folder names:

rman target /
catalog start with '+U01/ORCLRB/DATAFILE';
switch database to copy;
  1. enable redo apply

almost finished, now we only have to enable the dataguard process again…

dgmgrl /
edit database 'ORCLRB' set state='APPLY-ON';

done… just check the alert.log if the standby is receiving and applying the last few archivelogs.

Posted in Uncategorized | Tagged , | Leave a comment

auditing – what the hell is going on in my database

When I came into the office today, a colleague of mine was already waiting for me – one of our daily datapump exports has failed again. When looking into the log, I found this error message:

ORA-01466: unable to read data - table definition has changed

ok, this shouldn’t happen since the application is not doing ddl (hopefully), so who else could it be? But from ancient times I could remember that oracle is auditing several activities… but when I started google to get the name of the audit table, I was confused: it seems in Oracle 12c there are 2 different types auf auditing.

  1. old style
  2. ok, this is maybe the method you already know… for auditing all ddl statements on all tables, you just have to enter

    audit table
    

    … very simple – but boring πŸ˜‰

  3. new style
  4. the new style is a bit more complex – you have to define a policy first and then activate it. I’m no audit expert, but as far as I could gather the information, the new style is very powerful and can be configured more accurate, but some features of the old method are missing.

Ok, but back to my problem… to make a long story short, by default ddl statements are not captured but could be enabled with both methods. So i decided to use the new way…

so I first checked what is already in my audit table…

SELECT * FROM unified_audit_trail WHERE event_timestamp > sysdate-1 ;

… and waited. I found several posts complaining about slow performance of the audit table when there are too many entries. So first step is a cleanup of the current table. you can do it manually, but i decided to create a scheduler job doing it regularly… and when I’m doing a cleanup job, I’m going to cleanup also the old style audit logs… so after some try and error my job is looking like this:

BEGIN
  DBMS_SCHEDULER.CREATE_JOB
    (
       job_name        => 'PURGE_ALL_AUDIT_LOGS'
      ,start_date      => sysdate
      ,repeat_interval => 'freq=daily; byhour=1; byminute=0; bysecond=0'
      ,job_type        => 'PLSQL_BLOCK'
      ,job_action      => 'BEGIN
  DBMS_AUDIT_MGMT.INIT_CLEANUP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL, DEFAULT_CLEANUP_INTERVAL => 24);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_UNIFIED, LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_UNIFIED, LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,  LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD, LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,      LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_XML,     LAST_ARCHIVE_TIME => SYSDATE-90);
  DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL, TRUE);
END;');

first we have to initialize the cleanup… this is not necessary for the new unfied audit logs, but for the old style. then we have to define the retention time for all audit types seperately, and last but not least do the purge itself… easy, isn’t it? After running the job once I was able to query the audit table… ok, next step: implement DDL auditing.
First of all you should know that the audit rules are read during the login… if you are wondering why the audit is not working: try logout and login again πŸ˜‰ I have no idea if shared server processes are different…
So, back to the audit policy… in general there are 3 different methods to define when a command should be logged.

  1. defining the used privileges
  2. when someone is not owning a table but is able to read or modify it because of special system privileges (like ‘select any table’)

  3. defining actions
  4. you can define the action that should be logged. An action could be ‘create table’ or ‘update table’

  5. defining the used role
  6. when the right to do a special operation is permitted via a role, you can just specify it.

Since I want to audit all DDL regardless why the user is allowed to do it, my policy is like this:

CREATE audit policy ddl ACTIONS CREATE TABLE,CREATE INDEX,ALTER TABLE,ALTER INDEX,DROP INDEX,DROP TABLE,TRUNCATE TABLE,RENAME,LOCK TABLE;

if you want to narrow it a bit, you can specify the object that should be monitored…

CREATE audit policy test ACTIONS delete on hr.employees;

And last but not least it’s possible to add custom filter criteria…

CREATE audit policy test ACTIONS delete on hr.employees WHEN 'SYS_CONTEXT(''USERENV'', ''USERNAME'') != ''HR''';

Finaly you have to enable the policy!

audit policy ddl;

thats it… hope I could give you a simple and easy to understand introduction into the auditing feature of oracle.

Benjamin

Posted in Uncategorized | Leave a comment

how to use SQL tuning advisor with SQL*Plus

because I’ve worked on a lot of environments where I only have ssh access, i cannot use graphical tools like Toad or SQL Developer… so I have to find a suitable way to tune sql statements with sql*plus… I’m just describing how to use the sql tuning advisor, how to find the correct sql_id will be another artikel.

TAKE CARE: you need the oracle diagnostic & tuning pack license!

ok, not the technical part… there are 3 steps: create a tuning task, where you define what oracle should do, execute the tuning task and show the result of the tuning task. Lets start with #1

define sql_id=abcdefghijkl
define begin_snap=1234
define end_snap=1235

DECLARE
  l_sql_tune_task_id  VARCHAR2(100);
BEGIN
  l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (                   
                          begin_snap  => &begin_snap,
                          end_snap    => &end_snap,
                          sql_id      => '&sql_id',
                          scope       => DBMS_SQLTUNE.scope_comprehensive,
                          time_limit  => 7200, -- max runtime in seconds... so this means 2 hours
                          task_name   => '&sql_id._AWR_tuning_task');
  DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/

ok, now you have to execute the tuning task

EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => '&sql_id._AWR_tuning_task');

and when the task has finished, you can view the report with this sql statement:

SET LONG 10000;
SET PAGESIZE 1000
SET LINESIZE 200
column RECOMMENDATIONS format a200
-- optionally you can spool the output
-- spool sql_advisor_&sql_id._AWR_tuning_task.txt
SELECT DBMS_SQLTUNE.report_tuning_task('&sql_id._AWR_tuning_task') AS recommendations FROM dual;
-- spool off

if you want to accept a sql profile, the syntax to do so is withing the report, but are always the same…

exec dbms_sqltune.accept_sql_profile(task_name => '&sql_id._AWR_tuning_task', replace => TRUE);

if you want to undo (=remove) the sql profile, you just have to execute this:

exec DBMS_SQLTUNE.drop_sql_profile (name => '&sql_id._AWR_tuning_task');

you can drop the tuning task including the report (but without the profile) with this:

exec dbms_sqltune.drop_tuning_task(task_name => '&sql_id._AWR_tuning_task');

if you want to see the which sql tuning tasks exist, you can query the related dba view…

select task_name from DBA_ADVISOR_TASKS where advisor_name='SQL Tuning Advisor';

small hint: The automatic sql tuning advisor is running every night (if you haven’t disabled it). Internally it’s just a regular sql tuning task called ‘SYS_AUTO_SQL_TUNING_TASK’… you can get the report as described above (you just have to replace the task name) and see the recommendations of all automatically analysed sql statements within your AWR retention time… pretty cool to get an overview of slow sql.

Posted in Uncategorized | 1 Comment

alter table … shrink space – and its progress

well, there are a lot of good tutorials regarding when and how to use the “alter table … shrink space” command – and don’t panic, I won’t do another one πŸ˜‰

just a few words to mechanism behind: the shrink is done in 2 steps

  1. the rows from the “end” of a table will be moved to the “beginning”. Internally, this is done by “select – insert – delete” statements, row by row. So there are only locks on row level and a lot of small transactions. Therefore, this part is completely online and has just a small performance impact on running transactions (I’ve seen somewhat about 2% and 10%, depending on your application)
  2. The “high water mark” will be reset to the latest block of the table. Oracle has to lock the table exclusively – so it’s not really online.

and a short summary about the different “shrink space” commands (because I always forget them myself)

  • alter table … shrinc space:
    • do both steps successive
  • alter table … shrink space cascade:
    • do both steps successive for the table and all related objects like indexes and lobs
    • if you want to see all related objects, just execute this:
      select * from (table(DBMS_SPACE.OBJECT_DEPENDENT_SEGMENTS('<OWNER>','<TABLE>',null, 1)));
      
  • alter table … shrink space compact:
    • do only step 1
  • alter table … modify lob (…) (shrink space):
    • do step 1 & 2 for lob column

And now: the tricky part… how to minitor progress… and just the short answer at the beginning: it’s not possible. If you think of using v$session_longops – as I mentioned above, there many short transactions…

but there are some other indications you can use to check progress, although you can’t calculate the end time properly… nevertheless it’s enough to see the process is ongoing πŸ˜‰

select
   b.sid
  ,a.event
  ,a.WAIT_TIME
  ,round(d.read_value/1024/1024/1024,2) read_gb
  ,round(d.write_value/1024/1024/1024,2) write_gb
  ,round(d.undo_value/1024/1024/1024,2) undo_gb
  ,e.current_undo
  ,c.SQL_TEXT
from v$session_wait a
join v$session b on a.sid = b.sid
join v$sqlarea c on b.SQL_ID = c.SQL_ID
join (select * from (select s.sid, t.name, s.value from v$sesstat s inner join v$statname t on s.statistic#=t.statistic#)
              pivot (sum(value) as value for name in (
                 'undo change vector size' UNDO
                ,'physical read total bytes' READ
                ,'physical write total bytes' WRITE
))) d on a.sid=d.sid
join (select b.sid, sum(a.USED_UBLK) current_undo from V$TRANSACTION a join v$session b on a.SES_ADDR=b.saddr group by b.sid) e on e.sid=a.sid
where upper(c.sql_text) like 'ALTER TABLE%SHRINK SPACE%'
and b.sid != (select sys_context('USERENV','SID') from dual)
;

Just to explain the output:

  • SID: the user SID executing the shrink command
  • EVENT: the current wait event… until its “direct path read” or “db file … read” you are in step 1
  • WAIT_TIME: should be nearly 0, since we have just small transactions.
  • READ_GB: amount of data the session has read vom disk
  • WRITE_GB: amount of data the session has written to disk
  • UNDO_GB: amount of data the session has written to undo
  • CURRENT_UNDO: number of undo blocks currently used… should be very low (<20) – see “small transactions”
  • SQL_TEXT: the SQL text we are monitoring…
Posted in Uncategorized | Tagged | Leave a comment

ORA-01652: Who is using the temp tablespace?

if you are getting the error “ORA-01652: unable to extend temp segment by 128 in tablespace TEMP” you could just increase the temp tablespace, i.e. by adding an additional tempfile… the smarter way would be to check who is consuming so much temp tablespace…

The following SQL is giving you a live view:

SELECT
	 s.sid
	,s.serial#  
	,s.osuser
	,s.user#
	,s.username 
	,s.sql_id
	,s.module
	,s.program 
	,round(su.blocks * tbs.block_size / 1024 / 1024 /1024) gb_used 
	,su.tablespace
	--,sa.sql_text     -- uncomment if you want to see the SQL text
	--,sa.sql_fulltext -- uncomment if your Client can handle CLOB's (i.e. SQL Developer)
FROM  v$sort_usage su   
inner join v$session s on su.session_addr=s.saddr
inner join dba_tablespaces tbs on su.tablespace=tbs.tablespace_name
inner join v$sqlarea sa on s.sql_id=sa.sql_id
where su.blocks*tbs.block_size > 1024*1024*1024 -- more than 1 GB TEMP
ORDER BY gb_used desc; 

and if you have to check what happende some time ago (take care: Diagnosting & Tuning pack required!)

SELECT
   snap_id
  ,sample_time
  ,session_id
  ,session_serial#
  ,sql_id
  ,round(temp_space_allocated/1024/1024/1024) gb_used
  FROM dba_hist_active_sess_history
WHERE temp_space_allocated > 1024*1024*1024
ORDER BY sample_time;
Posted in Uncategorized | Tagged | Leave a comment

mystery performance issue with background process

recently I’ve got someΒ serious Performance issues on an 11.2 database. The mystery: Neither the AWR nor the ADDM report showed any obviously problems, but the linux system showed a very high load. So i tried to get some Kind of live view… Cloud Control was not availeble yet, so I used the great and free tool Mumbai

And it seems quite usual… but wait – what are These resources at the bottom of the line? Resource Group “other”… great, now i know exactly what it is πŸ™ But in the right lower corner is the list of top consuming processes, and there are some “E00x” processes there.

Just to be sure, i’ve checked with native SQL if the chart is correct:

select
   count(*) count
  ,trunc(SAMPLE_TIME, 'HH24') hour
  ,session_id
  ,session_serial#
  ,program
-- and whatever might be interesting, like EVENT, P1 & P1TEXT, etc.
from DBA_HIST_ACTIVE_SESS_HISTORY
where session_type='BACKGROUND'      -- just get background processes
  and wait_class != 'Idle'           -- ignor all idle processes
  and program not like 'rman%'       -- rman is approved backgrouond process to be not idle for a longer time
  and sample_time > sysdate-1        -- only last 24 hours
group by trunc(SAMPLE_TIME, 'HH24'), session_id  ,session_serial#  ,program --dont Forget to add the columns you might added
having count(*) > 180                -- only show if process was active half the time
order by 2 desc
;

the Output was like this:

COUNT   HOUR                    SESSION_ID      SESSION_SERIAL#   PROGRAM
360     201x-xx-xx 02:00:00     1282            21723             oracle@orcl (E002)
359     201x-xx-xx 01:00:00     1282            21723             oracle@orcl (E002)
360     201x-xx-xx 00:00:00     1282            21723             oracle@orcl (E002)

wow – what’s this? just to explain the column “count”: we are querying an ASH table. ASH is captured every second, every 10th capture is flushed to disk – so we have 6 snapshots per minute or 360 per hour… meaning in above query the background process was running all time.

So next step: figure out what the hell the E00x process is doing. But thanks to Oracle, the documentation is (as usual) quite good. It’s the EMON Slave Process, which “Performs database event management and notifications”. Ok, but why are they – or at lease some of these processes – running all time? After some research i found oracle bug 9735536 and a corresponding bugfix… and as Workaround the Suggestion to kill the Ennn processes together with the EMNC (EMON Coordinator Process) and voila – the System runs smoothly again… at least for some hours, after the next EMON process was running mad.

after installing the patch the issue was solved, so we can continue planning the 12c upgrade…

Benjamin

Posted in Uncategorized | Tagged , | Leave a comment