Friday, May 17, 2013

ORA-00600: internal error code, arguments: [kxfrVMapUpdate_helper_2]


While adding new nodes on 11.2, we saw the ORA-600 error with [kxfrVMapUpdate_helper_2]

Problem:
ORA-00600: internal error code, arguments: [kxfrVMapUpdate_helper_2], [13], [12], [], [], [], [], [], [], [], [], []

The statement causing the errors reported in trace is:
SELECT "SID","BLOCKING_INSTANCE","BLOCKING_SESSION" FROM "GV$SESSION" "VLOCK"

Cause:
In a generic scenario,cluster_database_instances are same at database startup. So, when adding the nodes, these numbers may cause to hit such bug.

Solution:
Restart after each node addition, so all instances are aware of the addition or set the high value for the cluster_database_instances and add the nodes.

Ex..


alter system set cluster_database_instances=20 scope=spfile sid='*';

Happy DBA... 

Thursday, May 9, 2013

All about AWR


Today, I am NOT going to make you boring  Going to tell you the same story. Which all the DBA's knows about it, but with interesting stuff.

First thing when there is a Problem is database, we check the server stats and go directly to AWR report. Did you run the AWR report? Okay, here we go.

A number of different statistics are collected by the AWR including wait events, time model statistics, active session history statistics, various system and session level statistics, object usage statistics. AWR consists of number of tables owned by SYS and stored in SYSAUX tablespace. All AWR table names start with identified WR, three different type designations

1) Metadata (WRM$)
2) Historical Data (WRH$)
3) AWR tables related to ad-visor functions (WRI$)

In some Oracle technical documents you will see the AWR tables also refereed to as the select workload repository (SWRF) tables.

We see the following repository views in the database.

V$ACTIVE_SESSION_HISTORY - Displays the active session history (ASH) sampled every second
V$METRIC - Displays metric information
V$METRICNAME - Displays the metrics associated with each metric group
V$METRIC_HISTORY - Displays historical metrics
V$METRICGROUP - Displays all metrics groups
DBA_HIST_SNAPSHOT - The table holds AWR snapshots information for all snapshots.
DBA_HIST_SQLSTAT - The information and statistics in this table are taken from V$SQL with each AWR snapshot, it chooses the top SQL statements from the dynamic view. The number of statements depends on the value of STATISTICS_LEVEL parameter.
DBA_HIST_SQL_PLAN - This table is pretty much the same as V$SQL_PLAN. the greatest thing about it is that it keeps execution plans of all statements and the information is not purged when snapshots are deleted.
DBA_HIST_SQLTEXT - Same as DBA_HIST_SQL_PLAN but with V$SQL_TEXT.
DBA_HIST_SEG_STAT - Same as DBA_HIST_SQLSTAT but with V$SEGMENT_STATISTICS, the columns are slightly different from the V$, the HIST table has a column for each statistic instead of the STATISTIC_NAME column that the V$ has. This table also holds information of the top segments of each snapshot and the number is also controlled by the STATISTIC_LEVEL parameter.
DBA_HIST_SYSTEM_EVENT - The HIST of V$SYSTEM_EVENT, holds the wait events information for every snapshot.
DBA_HIST_ACTIVE_SESS_HISTORY - The HIST table of V$ACTIVE_SESSION_HISTORY.
DBA_HIST_BASELINE - Displays baseline information
DBA_HIST_DATABASE_INSTANCE - Displays database environment information

AWR Script Usage
awrrpt.sql Displays various statistics for a range of snapshots Ids.
awrrpti.sql Displays statistics for a range of snapshot Ids on a specified database and instance.
awrsqrpt.sql Displays statistics of a particular SQL statement for a range of snapshot Ids. Run this report to inspect or debug the performance of a particular SQL statement.
awrsqrpi.sql Displays statistics of a particular SQL statement for a range of snapshot Ids on a specified SQL.
awrddrpt.sql Compares detailed performance attributes and configuration settings between two selected time periods.
awrddrpi.sql Compares detailed performance attributes and configuration settings between two selected time periods on a specific database and instance.

Interestingly we can use these views in many ways to find the database activities, sometimes we might be interested in what are the top 10 sql's running during that particular period. AWR generates those sql's but you don't want to run the whole AWR to find that. You can run this query to find. Don't forget to change the database id and number of instance in your DB.

ttitle center 'AWR Top SQL Report' skip 2
set pagesize 50000
set linesize 300

col snap_id     format 9999999            heading "Snap|ID"
col tm          format a15              heading "Snap|Start|Time"
col inst        format 90               heading "i|n|s|t|#"
col dur         format 990.00          heading "Snap|Dur|(m)"
col sql_id      format a15              heading "SQL|ID"
col phv         format 99999999999      heading "Plan|Hash|Value"
col module      format a20              heading "Module"
col elap        format 999990.00        heading "Elapsed|Time|(s)"
col elapexec    format 999990.00        heading "Elapsed|Time|per exec|(s)"
col cput        format 999990.00        heading "CPU|Time|(s)"
col clwait      format 999999990        heading "Cluster|Wait"
col bget        format 99999999990      heading "LIO"
col dskr        format 99999999990      heading "PIO"
col rowp        format 99999999990      heading "Rows"
col exec        format 9999990          heading "Exec"
col prsc        format 999999990        heading "Parse|Count"
col pxexec      format 9999990          heading "PX|Exec"
col pctdbt      format 990              heading "DB Time|%"
col aas         format 990.00           heading "A|A|S"
col time_rank   format 90               heading "Time|Rank"
col sql_text    format a40              heading "SQL|Text"


select *
       from (
             select
                  sqt.snap_id snap_id,
                  TO_CHAR(sqt.tm,'YY/MM/DD HH24:MI') tm,
                  sqt.inst inst,
                  sqt.dur dur,
                  sqt.sql_id sql_id,
                  sqt.phv phv,
                  to_clob(decode(sqt.module, null, null, sqt.module)) module,
                  nvl((sqt.elap), to_number(null)) elap,
                  nvl((sqt.elapexec), to_number(null)) elapexec,
                  nvl((sqt.cput), to_number(null)) cput,
                  sqt.clwait clwait,
                  sqt.bget bget,
                  sqt.dskr dskr,
                  sqt.rowp rowp,
                  sqt.exec exec,
                  sqt.prsc prsc,
                  sqt.pxexec pxexec,
                  sqt.aas aas,
                  sqt.time_rank time_rank
                  , nvl(st.sql_text, to_clob('** SQL Text Not Available **')) sql_text     -- PUT/REMOVE COMMENT TO HIDE/SHOW THE SQL_TEXT
             from        (
                          select snap_id, tm, inst, dur, sql_id, phv, module, elap, elapexec, cput, clwait, bget, dskr, rowp, exec, prsc, pxexec, aas, time_rank
                          from
                                             (
                                               select
                                                      s0.snap_id snap_id,
                                                      s0.END_INTERVAL_TIME tm,
                                                      s0.instance_number inst,
                                                      round(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440
                                                              + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60
                                                              + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME)
                                                              + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2) dur,
                                                      e.sql_id sql_id,
                                                      e.plan_hash_value phv,
                                                      max(e.module) module,
                                                      sum(e.elapsed_time_delta)/1000000 elap,
                                                      decode((sum(e.executions_delta)), 0, to_number(null), ((sum(e.elapsed_time_delta)) / (sum(e.executions_delta)) / 1000000)) elapexec,
                                                      sum(e.cpu_time_delta)/1000000     cput,
                                                      sum(e.clwait_delta)/1000000 clwait,
                                                      sum(e.buffer_gets_delta) bget,
                                                      sum(e.disk_reads_delta) dskr,
                                                      sum(e.rows_processed_delta) rowp,
                                                      sum(e.executions_delta)   exec,
                                                      sum(e.parse_calls_delta) prsc,
                                                      sum(px_servers_execs_delta) pxexec,
                                                      (sum(e.elapsed_time_delta)/1000000) / ((round(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440
                                                                                            + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60
                                                                                            + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME)
                                                                                            + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2))*60) aas,
                                                      DENSE_RANK() OVER (
                                                      PARTITION BY s0.snap_id ORDER BY e.elapsed_time_delta DESC) time_rank
                                               from
                                                   dba_hist_snapshot s0,
                                                   dba_hist_snapshot s1,
                                                   dba_hist_sqlstat e
                                                   where
                                                    s0.dbid                   = 3615614091                -- CHANGE THE DBID HERE!
                                                    AND s1.dbid               = s0.dbid
                                                    and e.dbid                = s0.dbid
                                                    AND s0.instance_number    in (1,2,3,4,5)      -- CHANGE THE INSTANCE_NUMBER HERE!
                                                    AND s1.instance_number    = s0.instance_number
                                                    and e.instance_number     = s0.instance_number
                                                    AND s1.snap_id            = s0.snap_id + 1
                                                    and e.snap_id             = s0.snap_id + 1
                                               group by
                                                    s0.snap_id, s0.END_INTERVAL_TIME, s0.instance_number, e.sql_id, e.plan_hash_value, e.elapsed_time_delta, s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME
                                             )
                          where
                          time_rank <= 10                                     -- GET TOP 5 SQL ACROSS SNAP_IDs... YOU CAN ALTER THIS TO HAVE MORE DATA POINTS
                         )
                        sqt,
                        dba_hist_sqltext st
             where st.sql_id(+)             = sqt.sql_id
             and st.dbid(+)                 = 3615614091 -- CHANGE THE DBID HERE!
 -- AND TO_CHAR(tm,'D') = 1                                                  -- Day of week: 1=Sunday 7=Saturday
 -- AND TO_CHAR(tm,'D') <= 7
 -- AND TO_CHAR(tm,'HH24MI') = 0900                                          -- Hour
 -- AND TO_CHAR(tm,'HH24MI') <= 1800
 --AND tm >= TO_DATE('2013-apr-08 12:00:00','yyyy-mon-dd hh24:mi:ss')     -- Data range
 --AND tm <= TO_DATE('2013-apr-10 12:00:09','yyyy-mon-dd hh24:mi:ss')
 -- AND snap_id in (338,339)
 -- AND snap_id = 335 and snap_id <= 339
 -- AND snap_id = 3172
 and sqt.sql_id in ('42pk03umxrfqa', '3dhz8tx744mh5', 'ah322609g2yrt', '5md8gkw13s2qx', '49a9zwxzc1rh4', 'gqa75mwncy0h3', '3p9nzusr2j2jd', '3xktjahmc1ava')
 -- AND lower(st.sql_text) like 'select%'
 -- AND lower(st.sql_text) like 'insert%'
 -- AND lower(st.sql_text) like 'update%'
 -- AND lower(st.sql_text) like 'merge%'
 -- AND pxexec  0
 -- AND aas  .5
             order by
             -- snap_id                             -- TO GET SQL OUTPUT ACROSS SNAP_IDs SEQUENTIALLY AND ASC
             nvl(sqt.elap, -1) desc, sqt.sql_id     -- TO GET SQL OUTPUT BY ELAPSED TIME
             )
-- where rownum <= 20
 ;

next time, when I have time will explore more options with ASH until then have fun...

Wednesday, May 1, 2013

Toady my work started with problem, when I woke up from bed with paging alert.

Two of the nodes were down on five node cluster... How wonderful it is.

ASM Log on Evicted Node 2:

NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Thu Mar 14 07:48:20 2013
NOTE: client exited [17438] w
NOTE: force a map free for map id 2
Thu Mar 14 07:48:23 2013
PMON (ospid: 17255): terminating the instance due to error 481
Instance terminated by PMON, pid = 17255

ASM Log on Evicted Node 4:

Thu Mar 14 07:49:32 2013
NOTE: ASM client PROD2:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /opt/app/oracle/product/diag/asm/+asm/+ASM2/trace/+ASM2_ora_32689.trc
Thu Mar 14 07:50:19 2013
PMON (ospid: 31211): terminating the instance due to error 481
Thu Mar 14 07:50:19 2013
System state dump requested by (instance=2, osid=31211 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /opt/app/oracle/product/diag/asm/+asm/+ASM2/trace/+ASM2_diag_31229.trc
Dumping diagnostic data in directory=[cdmp_20130314075019], requested by (instance=2, osid=31211 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 31211

For some reason PMON was not able to start on the evicted nodes. By checking the metalink, we found "terminating the instance due to error 481" can happen because of interconnect IP issue. On further debugging the PMON process is killed by the Primary node, seems like there was OS Patching was carried out on the evicted node. We tried to rollback the OS patch and the problem resolved. So to conclude OS Patching on one Node was the problem, the theory is when eviction happens you need to look at all the nodes of ASM Logs to find the Root cause.

Have fun...

Thursday, March 7, 2013

OERR - Source location


Today, I found something interesting. All DBA's would have been more familiar with OERR command, If you have any Oracle error it gives unique error code and if you pass the error code, we will get the cause and action, like below.

$ oerr ora 12521
12521, 00000, "TNS:listener does not currently know of instance requested in connect descriptor"
// *Cause:  The listener received a request to establish a connection to a
// database or other service. The connect descriptor received by the listener
// specified in addition to the service name an instance name for an instance
// (usually a database instance) that either has not yet dynamically registered
// with the listener or has not been statically configured for the listener.
// This may be a temporary condition such as after the listener has started,
// but before the database instance has registered with the listener.
// *Action:
//  - Wait a moment and try to connect a second time.
//  - Check which instances are currently known by the listener by executing:
//    lsnrctl services <listener name>
//  - Check that the INSTANCE_NAME parameter in the connect descriptor specifies
//    an instance name known by the listener.
//  - Check for an event in the listener.log file.

But today, I was eager to find out where it is trying to get all these message. OERR is an awk command, which fetches the error from oracle file and it is available only for UNIX for windows, there are many perl programmes.

I started with strace command to find out what are the libraries it is invoking.

strace -f -o abc01.log oerr ora 12521

Now you might see abc01.log with the output of Oracle libraries files it is trying to use

First line of the file show that we are using oerr with the parameters what we are passing.

16189 execve("/opt/app/oracle/product/11203/bin/oerr", ["oerr", "ora", "12521"], [/* 61 vars */]) = 0

Somewhere in the log you can see the awk command is used...

16197 execve("/bin/awk", ["awk", "-F:", "{\n\t\tif (index ($3, \"*\") == 0)\n\t\t"], [/* 59 vars */]) = 0

Now say Hello to this line oraus.msg file, try opening this file BINGO! you will see all the errors used by Oracle.

25344 stat("/opt/app/oracle/product/11203/rdbms/mesg/oraus.msg", {st_mode=S_IFREG|0644, st_size=4923359, ...}) = 0


All Oracle Errors from oraus.msg file... ;)

00000, 00000, "normal, successful completion"
// *Cause:  Normal exit.
// *Action: None.
00001, 00000, "unique constraint (%s.%s) violated"
// *Cause: An UPDATE or INSERT statement attempted to insert a duplicate key.
//         For Trusted Oracle configured in DBMS MAC mode, you may see
//         this message if a duplicate entry exists at a different level.
// *Action: Either remove the unique restriction or do not insert the key.
/0002        reserved for v2 compatibility (null column)
/0003        reserved for v2 compatibility (column value truncated)
/0004        reserved for v2 compatibility (end-of-fetch)
/0009        reserved for v2 compatibility
/
/ 10 - 49 user session and session switching errors
/
00017, 00000, "session requested to set trace event"
// *Cause:  The current session was requested to set a trace event by another
//          session.
// *Action: This is used internally; no action is required.
00018, 00000, "maximum number of sessions exceeded"
// *Cause:  All session state objects are in use.
// *Action: Increase the value of the SESSIONS initialization parameter.

Evil's are also part of these error code inside Oracle...


10668, 00000, "Inject Evil Identifiers"
// *Cause:  event 10668 is set to some number > 0, causing 1/(value-1) of all
//          identifiers to be replaced by a maximum amount of x's.  It is
//          common for an identifier to be parsed once with a max of 30 bytes,
//          then reparsed later with a max of 4000, so it may not be possible
//          to inject such an identifier without the aid of this event.  A
//          value of 1 causes no identifiers to be corrupted.
// *Action: never set this event



Happy DBA... ;)

Monday, February 4, 2013

ORA-29516: Aurora assertion failure


One fine day application users called me for Oracle error, while they are running below package. They were complaining it could be memory issue, but I disagreed and continued my search process for the Error. While going through the metalink these are oracle bug and gave had a workaround to set the session variable and run the package again, there was no complaints after setting the variable and job completed without any error.

ORA-29516: Aurora assertion failure: Assertion failure at joez.c:3377
Bulk load of method java/lang/Object.<initfailed; insufficient shm-object space
ORA-06512: at "BASE.ADMIN_KEY_LOADER", line 885
ORA-06512: at line 2

Session Level Variable:

Alter session set JAVA_JIT_ENABLED=FALSE;

Happy DBA... ;)

Thursday, January 24, 2013

Delete RAC Node


Recently I had a chance to delete Oracle RAC Node, this document is reference to delete Node in 5 Node RAC cluster, uses ASM (11R2). It will show you example of removing the cluster node “demo-server-db40”.

Before starting the node removal, take backup of OCR manually using ocrconfig –export. This is just for our safety, the OCR files automatically ships to other available nodes after node delete. Check the output after 7hrs of node delete using ocrconfig -showbackup.
ocrconfig -export /backup/OCRFILE_export.dmp

Step 1: Remove Oracle RDBMS Home and Binaries

1.1 To remove Oracle RBMS software from the server you need to disable and remove the listener from the server.
Execute the following from any node in the cluster:
$ srvctl disable listener -l LISTENER -n demo-server-db40
$ srvctl stop listener -l LISTENER -n demo-server-db40

Example Output:

[13:49]oracle@demo-server-db40[+ASM1]$ srvctl disable listener -l LISTENER -n demo-server-db40
[13:50]oracle@demo-server-db40[+ASM1]$ srvctl stop listener -l LISTENER -n demo-server-db40
1.2 Run the following command on the node that you are deleting to update the inventory on that node
i.e Here it is going to be “demo-server-db40”
. SDLFMAP1
cd $ORACLE_HOME/oui/bin
./runInstaller –updateNodeList ORACLE_HOME=/opt/app/oracle/product/11203 "CLUSTER_NODES={demo-server-db40}" -local
CAUTION: Don't forget to supply local here and also you should have X-Windows 11 enabled

Example Output:

If you don’t enable X-windows you will get bellow error.

[13:58]oracle@demo-server-db40[SDLFMAP1]$ ./runInstaller .updateNodeList ORACLE_HOME=/opt/app/oracle/product/11203 "CLUSTER_NODES={demo-server-db40}" -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 15725 MB    Passed
Checking monitor: must be configured to display at least 256 colors
    >>> Could not execute auto check for display colors using command /usr/bin/xdpyinfo. Check if the DISPLAY variable is set.    Failed <<<<
Some requirement checks failed. You must fulfill these requirements before continuing with the installation,
Continue? (y/n) [n]
After Enabling the X-Windows:
[13:59]oracle@demo-server-db40[SDLFMAP1]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/11203 "CLUSTER_NODES={demo-server-db40}" -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 15725 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oracle/product/oraInventory
'UpdateNodeList' was successful.

1.3 Following steps are for 11gR2, for older versions use OUI in $ORACLE_HOME/oui/bin with GUI
Now de-install the oracle home on the node that you are deleting as:

i.e Here it is going to be “demo-server-db40”

. SDLFMAP1
cd $ORACLE_HOME/deinstall
./deinstall -local
CAUTION - Don't forget to supply local here.

Example Output:

[14:00]oracle@demo-server-db40[SDLFMAP1]$ ./deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /opt/app/oracle/product/oraInventory/logs/
############ ORACLE DEINSTALL & DECONFIG TOOL START ############
######################### CHECK OPERATION START #########################
## [START] Install check configuration ##
Checking for existence of the Oracle home location /opt/app/oracle/product/11203
Oracle Home type selected for deinstall is: Oracle Real Application Cluster Database
Oracle Base selected for deinstall is: /opt/app/oracle/product
Checking for existence of central inventory location /opt/app/oracle/product/oraInventory
Checking for existence of the Oracle Grid Infrastructure home /opt/app/oracle/grid11203
The following nodes are part of this cluster: demo-server-db40
Checking for sufficient temp space availability on node(s) : 'demo-server-db40'
## [END] Install check configuration ##
Network Configuration check config START
Network de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/netdc_check2013-01-22_02-00-43-PM.log
Specify all Single Instance listeners that are to be de-configured [LISTENER_RMAN_DUP]:
Network Configuration check config END
Database Check Configuration START
Database de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/databasedc_check2013-01-22_02-01-24-PM.log
ERROR: The option -local will not modify any database configuration for this Oracle home.
Following databases have instances configured on local node : 'SDLFMAP'. Remove these database instances using dbca before de-installing the local Oracle home.
Enterprise Manager Configuration Assistant START
EMCA de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/emcadc_check2013-01-22_02-01-28-PM.log
Enterprise Manager Configuration Assistant END
Oracle Configuration Manager check START
OCM check log file location : /opt/app/oracle/product/oraInventory/logs//ocm_check5846.log
Oracle Configuration Manager check END
######################### CHECK OPERATION END #########################
####################### CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is: /opt/app/oracle/grid11203
The cluster node(s) on which the Oracle home deinstallation will be performed are:demo-server-db40
Since -local option has been specified, the Oracle home will be deinstalled only on the local node, 'demo-server-db40', and the global configuration will be removed.
Oracle Home selected for deinstall is: /opt/app/oracle/product/11203
Inventory Location where the Oracle home registered is: /opt/app/oracle/product/oraInventory
Following Single Instance listener(s) will be de-configured: LISTENER_RMAN_DUP
No Enterprise Manager configuration to be updated for any database(s)
No Enterprise Manager ASM targets to update
No Enterprise Manager listener targets to migrate
Checking the config status for CCR
Oracle Home exists with CCR directory, but CCR is not configured
CCR check is finished
A log of this session will be written to: '/opt/app/oracle/product/oraInventory/logs/deinstall_deconfig2013-01-22_02-00-40-PM.out'
Any error messages from this session will be written to: '/opt/app/oracle/product/oraInventory/logs/deinstall_deconfig2013-01-22_02-00-40-PM.err'
############# ORACLE DEINSTALL & DECONFIG TOOL END #############

1.4 Now run the following command from any server that you are not deleting from RDBMS home to update the orainvantory of the existing nodes in the cluster
i.e Here it is going to be “demo-server-db41”

. SDLFMAP2
cd $ORACLE_HOME/oui/bin
./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/11203 "CLUSTER_NODES={demo-server-db41,demo-server-db10,demo-server-db11,demo-server-db12}"
After this we are done with removing the RDBMS software from the node we are deleting i.e. demo-server-db40. and now we will move to the next step to removing crs and node from the grind infrastructure cluster.

Example Output:

[14:12]oracle@demo-server-db41[SDLFMAP2]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/11203 "CLUSTER_NODES={demo-server-db41,demo-server-db10,demo-server-db11,demo-server-db12}"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 15922 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oracle/product/oraInventory
'UpdateNodeList' was successful.

Step 2: Remove ASM Binaries and CRS
2.1 Issue "olsnodes -s -t" to see if the node demo-server-db40 is pinned.
If the node demo-server-db40 is pinned, then you need to unpin the node before deleting the node.
i.e Here it is going to be “demo-server-db40”

Example Output:

14:18]oracle@demo-server-db40[+ASM1]$ olsnodes -s -t
demo-server-db40     Active  Unpinned
demo-server-db41     Active  Unpinned
demo-server-db10    Active  Unpinned
demo-server-db11    Active  Unpinned
demo-server-db12    Active  Unpinned
To unpin the node, issue following command after setting oracle home and path to crs.
. ASM1
crsctl unpin css -n demo-server-db40

2.2 As root on the node that you are deleting from the grid, demo-server-db40
. ASM1
cd $ORACLE_HOME/crs/install
./rootcrs.pl -deconfig –force
i.e Here it is going to be “demo-server-db40”

Example Output:

[14:19]oracle@demo-server-db40[+ASM1]$ sudo /opt/app/oracle/grid11203/crs/install/rootcrs.pl -deconfig -force
Using configuration parameter file: /opt/app/oracle/grid11203/crs/install/crsconfig_params
Network exists: 1/10.211.68.0/255.255.255.0/bond1, type static
VIP exists: /demo-server-db40-vip/10.211.68.212/10.211.68.0/255.255.255.0/bond1, hosting node demo-server-db40
VIP exists: /demo-server-db41-vip/10.211.68.213/10.211.68.0/255.255.255.0/bond1, hosting node demo-server-db41
VIP exists: /demo-server-db10-vip/10.211.68.90/10.211.68.0/255.255.255.0/bond1, hosting node demo-server-db10
VIP exists: /demo-server-db11-vip/10.211.68.91/10.211.68.0/255.255.255.0/bond1, hosting node demo-server-db11
VIP exists: /demo-server-db12-vip/10.211.68.92/10.211.68.0/255.255.255.0/bond1, hosting node demo-server-db12
GSD exists
ONS exists: Local port 6100, remote port 6200, EM port 2016
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'demo-server-db40'
CRS-2677: Stop of 'ora.registry.acfs' on 'demo-server-db40' succeeded
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.crsd' on 'demo-server-db40'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.REDO.dg' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.DATA1_T0.dg' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.DATA1_T1.dg' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.FLASH.dg' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.SYSTEMDG.dg' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.oc4j' on 'demo-server-db40'
CRS-2677: Stop of 'ora.DATA1_T0.dg' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.DATA1_T1.dg' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.REDO.dg' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.FLASH.dg' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'demo-server-db40' succeeded
CRS-2672: Attempting to start 'ora.oc4j' on 'demo-server-db41'
CRS-2676: Start of 'ora.oc4j' on 'demo-server-db41' succeeded
CRS-2677: Stop of 'ora.SYSTEMDG.dg' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'demo-server-db40'
CRS-2677: Stop of 'ora.asm' on 'demo-server-db40' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'demo-server-db40' has completed
CRS-2677: Stop of 'ora.crsd' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.crf' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.ctssd' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.evmd' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.asm' on 'demo-server-db40'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'demo-server-db40'
CRS-2677: Stop of 'ora.crf' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.evmd' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.asm' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'demo-server-db40'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'demo-server-db40'
CRS-2677: Stop of 'ora.cssd' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'demo-server-db40'
CRS-2677: Stop of 'ora.drivers.acfs' on 'demo-server-db40' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'demo-server-db40' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'demo-server-db40'
CRS-2677: Stop of 'ora.gpnpd' on 'demo-server-db40' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lv
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node
[14:23]oracle@demo-server-db40[+ASM1]$
2.3 As root from any node that you are not deleting, for example, demo-server-db41

i.e Here it is going to be “demo-server-db41”
. ASM2
cd $ORACLE_HOME/bin
./crsctl delete node -n demo-server-db40

Example Output:

[14:26]oracle@demo-server-db41[+ASM2]$ sudo /opt/app/oracle/grid11203/bin/crsctl delete node -n demo-server-db40
CRS-4661: Node demo-server-db40 successfully deleted.

2.4 On the node you want to delete i.e. demo-server-db40, run the following command where is the name of the node that you are deleting:
i.e Here it is going to be “demo-server-db40”
. ASM1
cd $ORACLE_HOME/oui/bin
./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/grid11203 "CLUSTER_NODES={demo-server-db40}" CRS=TRUE -local

Note: Do not remove the "{" and "}" in above command.
This is very important step.
If you do not execute this step then deinstall utility will deinstall Grid homes from all the nodes.
So to avoid this we are removing all the other nodes from the inventory of demo-server-db40.

Example Output:

[14:47]oracle@demo-server-db40[+ASM1]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/grid11203 "CLUSTER_NODES={demo-server-db40}" CRS=TRUE -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 16373 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oracle/product/oraInventory
'UpdateNodeList' was successful.

2.5 As grid user on the node that you are deleting i.e. demo-server-db40 execute the following to remove the Grid home

# in demo-server-db40, in the middle it will prompt to run perl script
. ASM1
cd $ORACLE_HOME/deinstall
./deinstall –local

***Caution***:
If you do not specify the -local flag, then the command removes the grid infrastructure homes from every node in the cluster.

Example Output:

[14:48]oracle@demo-server-db40[+ASM1]$ ./deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /opt/app/oracle/product/oraInventory/logs/
############ ORACLE DEINSTALL & DECONFIG TOOL START ############
######################### CHECK OPERATION START #########################
## [START] Install check configuration ##
Checking for existence of the Oracle home location /opt/app/oracle/grid11203
Oracle Home type selected for deinstall is: Oracle Grid Infrastructure for a Cluster
Oracle Base selected for deinstall is: /opt/app/oracle/product
Checking for existence of central inventory location /opt/app/oracle/product/oraInventory
Checking for existence of the Oracle Grid Infrastructure home
The following nodes are part of this cluster: demo-server-db40
Checking for sufficient temp space availability on node(s) : 'demo-server-db40'

## [END] Install check configuration ##
Traces log file: /opt/app/oracle/product/oraInventory/logs//crsdc.log
Enter an address or the name of the virtual IP used on node "demo-server-db40"[demo-server-db40-vip]
 >
The following information can be collected by running "/sbin/ifconfig -a" on node "demo-server-db40"
Enter the IP netmask of Virtual IP "10.211.68.212" on node "demo-server-db40"[255.255.255.0]
 >
Enter the network interface name on which the virtual IP address "10.211.68.212" is active
 >
Enter an address or the name of the virtual IP[]
 >
Network Configuration check config START
Network de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/netdc_check2013-01-22_02-50-10-PM.log
Network Configuration check config END
Asm Check Configuration START
ASM de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/asmcadc_check2013-01-22_02-50-11-PM.log
######################### CHECK OPERATION END #########################
####################### CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is:
The cluster node(s) on which the Oracle home deinstallation will be performed are:demo-server-db40
Since -local option has been specified, the Oracle home will be deinstalled only on the local node, 'demo-server-db40', and the global configuration will be removed.
Oracle Home selected for deinstall is: /opt/app/oracle/grid11203
Inventory Location where the Oracle home registered is: /opt/app/oracle/product/oraInventory
Option -local will not modify any ASM configuration.
Do you want to continue (y - yes, n - no)? [n]: y
A log of this session will be written to: '/opt/app/oracle/product/oraInventory/logs/deinstall_deconfig2013-01-22_02-49-13-PM.out'
Any error messages from this session will be written to: '/opt/app/oracle/product/oraInventory/logs/deinstall_deconfig2013-01-22_02-49-13-PM.err'

######################## CLEAN OPERATION START ########################
ASM de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/asmcadc_clean2013-01-22_02-50-52-PM.log
ASM Clean Configuration END
Network Configuration clean config START
Network de-configuration trace file location: /opt/app/oracle/product/oraInventory/logs/netdc_clean2013-01-22_02-50-52-PM.log
De-configuring Naming Methods configuration file...
Naming Methods configuration file de-configured successfully.
De-configuring backup files...
Backup files de-configured successfully.
The network configuration has been cleaned up successfully.
Network Configuration clean config END
---------------------------------------->
The deconfig command below can be executed in parallel on all the remote nodes. Execute the command on  the local node after the execution completes on all the remote nodes.
Run the following command as the root user or the administrator on node "demo-server-db40".
/tmp/deinstall2013-01-22_02-48-57PM/perl/bin/perl -I/tmp/deinstall2013-01-22_02-48-57PM/perl/lib -I/tmp/deinstall2013-01-22_02-48-57PM/crs/install /tmp/deinstall2013-01-22_02-48-57PM/crs/install/rootcrs.pl -force  -deconfig -paramfile "/mp/deinstall2013-01-22_02-48-57PM/response/deinstall_Ora11g_gridinfrahome1.rsp"
Press Enter after you finish running the above commands
<----------------------------------------
Remove the directory: /tmp/deinstall2013-01-22_02-48-57PM on node:
Setting the force flag to false
Setting the force flag to cleanup the Oracle Base
Oracle Universal Installer clean START
Detach Oracle home '/opt/app/oracle/grid11203' from the central inventory on the local node : Done
Delete directory '/opt/app/oracle/grid11203' on the local node : Done
The Oracle Base directory '/opt/app/oracle/product' will not be removed on local node. The directory is in use by Oracle Home '/opt/app/oracle/product/11203'.
The Oracle Base directory '/opt/app/oracle/product' will not be removed on local node. The directory is in use by central inventory.
Oracle Universal Installer cleanup was successful.
Oracle Universal Installer clean END

## [START] Oracle install clean ##
Clean install operation removing temporary directory '/tmp/deinstall2013-01-22_02-48-57PM' on node 'demo-server-db40'
## [END] Oracle install clean ##
######################### CLEAN OPERATION END #########################
####################### CLEAN OPERATION SUMMARY #######################
Oracle Clusterware is stopped and successfully de-configured on node "demo-server-db40"
Oracle Clusterware is stopped and de-configured successfully.
Successfully detached Oracle home '/opt/app/oracle/grid11203' from the central inventory on the local node.
Successfully deleted directory '/opt/app/oracle/grid11203' on the local node.
Oracle Universal Installer cleanup was successful.
Oracle deinstall tool successfully cleaned up temporary directories.
#######################################################################
############# ORACLE DEINSTALL & DECONFIG TOOL END #############

Output of Perl Execution:
%sudo /tmp/deinstall2013-01-22_02-48-57PM/perl/bin/perl -I/tmp/deinstall2013-01-22_02-48-57PM/perl/lib -I/tmp/deinstall2013-01-22_02-48-57PM/crs/install /tmp/deinstall2013-01-22_02-48-57PM/crs/install/rootcrs.pl -force  -deconfig -paramfile "/tmp/deinstall2013-01-22_02-48-57PM/response/deinstall_Ora11g_gridinfrahome1.rsp"
Using configuration parameter file: /tmp/deinstall2013-01-22_02-48-57PM/response/deinstall_Ora11g_gridinfrahome1.rsp
****Unable to retrieve Oracle Clusterware home.
Start Oracle Clusterware stack and try again.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
Either /etc/oracle/ocr.loc does not exist or is not readable Make sure the file exists and it has read and execute access Either /etc/oracle/ocr.loc does not exist or is not readable Make sure the file exists and it has read and execute access
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Modify failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Delete failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
################################################################
# You must kill processes or reboot the system to properly #
# cleanup the processes started by Oracle clusterware          #
################################################################
ACFS-9313: No ADVM/ACFS installation detected.
Either /etc/oracle/olr.loc does not exist or is not readable Make sure the file exists and it has read and execute access Either /etc/oracle/olr.loc does not exist or is not readable Make sure the file exists and it has read and execute access Failure in execution (rc=-1, 256, No such file or directory) for command /etc/init.d/ohasd deinstall
error: package cvuqdisk is not installed Successfully deconfigured Oracle clusterware stack on this node

2.6 Execute following on the node_to_be_deleted i.e. demo-server-db40 as root.

rm -rf /etc/oraInst.loc

2.7 On any node other than the node you are not deleting i.e. demo-server-db41, as grid user to update the inventory of the remaining nodes in the grid infrastructure

i.e Here it is going to be “demo-server-db41”
. ASM2
cd $ORACLE_HOME/oui/bin
./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/grid11203 "CLUSTER_NODES={demo-server-db41,demo-server-db10,demo-server-db11,demo-server-db12}" CRS=TRUE
Do not remove the "{" and "}" in above command.

Example Output:
[15:16]oracle@demo-server-db41[+ASM2]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/grid11203 "CLUSTER_NODE
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 15923 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oracle/product/oraInventory
'UpdateNodeList' was successful.

Step 3: Verify from any node that you are not deleting i.e. demo-server-db41
i.e. Here it is going to be “demo-server-db41”
. ASM2
cd $ORACLE_HOME/bin

./cluvfy stage -post nodedel -n demo-server-db40 –verbose

Example Output:
[15:22]oracle@demo-server-db41[+ASM2]$ ./cluvfy stage -post nodedel -n demo-server-db40 -verbose
Performing post-checks for node removal
Checking CRS integrity...
Clusterware version consistency passed
The Oracle Clusterware is healthy on node "demo-server-db12"
The Oracle Clusterware is healthy on node "demo-server-db11"
The Oracle Clusterware is healthy on node "demo-server-db10"
The Oracle Clusterware is healthy on node "demo-server-db41"
CRS integrity check passed
Result:
Node removal check passed
Post-check for node removal was successful.

Tuesday, January 8, 2013

Creating New Oracle Services


Once you added new nodes or you wanted to create a new service this testing scenario will help you. In my case, I have added new nodes to the database wanted to check connections are passing through.

Add this TNS to your tnsnames.ora with correct host and port.

test =
  (description =
    (address_list =
      (address = (protocol = tcp)(host = yourserver-scan.sonynei.net)(port = 1521))
    )
    (connect_data =
      (service_name = test.sonynei.net)
      (server = dedicated)
    )
  )

This command will create new service on the TESTDB with the service name test and runs only on the TESTDB3 ("New Instance")

# Create services:
srvctl add service -d TESTDB -s test -r TESTDB3

# Start the service
srvctl start service -d TESTDB -s test

# Try login to the service using the new service, If the connection fails check why it is failing
sqlplus dbaccount/<password>@test


# To modify the service to different node
srvctl modify service -d TESTDB -s test -i TESTDB3 -t TESTDB2

# Start the service
srvctl start service -d TESTDB -s test

# Status of services running in database
srvctl status service -d TESTDB

# Stop the service running on the node
srvctl stop service -d TESTDB -s test

# Remove the services running on the database
srvctl remove service -d TESTDB -s test -i TESTDB2

This small test can reduce the production outage before relasing the nodes to the Production. Also additionally, I will try relocating the scan listener to new nodes and check the services are getting registered properly for zero downtime in Production. This command will be used to relocate the scan_listener2 to 2nd node.

srvctl relocate scan -i 2 -n TESTDB2

Happy DBA... :)