Troubleshooting ora-02437

Last Saturday I worked overtime for an important production system Oracle 11.2.0.4.0 with upgrading, migrating and then putting on the new production environment. Fortunately the whole process is pretty smooth from the beginning to end because I’ve done plenty of test in advance.

Here I need to introduce the method I’ve used about how to upgrade from oracle 11.2.0.4.0 to 19.13.0.0.0 and how to migrate business data (less than 150 GB) to the new production system.

Upgrade (on the NEW pre-production)

  • Installing oracle 19.13 (the golden image)
  • Creating oracle database instance by shell script

Migrate (on the NEW production)

  • Expdp business data from the source oracle 11.2.0.4.0
  • Impdp business data to the target oracle 19.13.0.0.0

I’ll share the issue I’ve encountered and how to solve it with you when doing the test for expdp and impdp in this blog note.

Continue reading “Troubleshooting ora-02437”

Troubleshooting ORA-01555

A couple of days ago the app batch tasks that resides in product oracle 19.11 with the architecture of primary and secondary ran failed and subsequently reported a level 1 warning for application at 09:03, I went to log into that oracle database to look into it immediately when the project group sent that warning message to me.

Based on the advice of the app developer I found out the log file recording app batch tasks result then checked the last some lines of log file, it has expected to show ORA-01555 rollback segment “_SYS_XXXXXXX” too small for the PL/SQL package.procedure name. It’s difficult to make sure what stuff there has in those procedure and I have to ask the developer to check that procedure so that he is able to find some clue. Waiting for a while he checked a sql (due to the security reason I don’t put it on here) and at the same time I’ve also found that sql (including sql id, sql text and the error ORA-01555) in oracle alert log.

Next I went to check the history execution situation of that sql, OMG, it has produced two different execution plans becaus it has shown 2 number of plan_hash_value(s) and the current plan_hash_value has the more elapsed time (1800 seconds) but the previous plan_hash_value has just shorter elapsed time (200 seconds), obviously, the good execution plan used the three indexed and the bad one used only one index. I talked the developer whether he can rewrite the sql, he said it’s really hard for him to rewrite it, if so, I have to bind the good execution to the sql by sqlprofile approach. I am able to discover the sql profile of sql_id of the sql from dba_sql_profiles when I finished binding sql execution plan.

Furthermore, I’ve also checked the used ratio of the historical UNDO tablespace, it’s always 95% – 99% at 8:00-11:00 every day but other time periods it’s only 10% – 30%. By the way, the used ratio of UNDO tablespace has contained the sum(bytes) for the status of undo is in ACTIVE, UNEXPIRED. Unfortunately there has also occupied plenty of unexpired usage (not much too usage in active) in UNDO tablespace when I observed the used ratio of UNDO at that moment. If the sql stuff doesn’t exist in the unexpired of undo and oracle goes to fetch them as possible as it can now it will report the ORA-01555. Afterwards I’ve doubled the total size of undo tablespace.

The parameter undo_retention is also a considerable factor reflecting undo entry. Therefore, I’ve also adjusted it from the original 3600s to the 5400s.

That’s all stuff I shared with you, happy Sunday!

Startup issue

The project manager rang me up to report that their oracle test environment for single instance 19.13 by HA architecture (primary and secondary) but secondary has never been started up after he went to shut down primary.

Then I joined his Tencent meeting to guide him to do some troubleshooting work. Firstly I’ve noticed that the issue has happened on the following screenshot.

As we can see from the previous ORA-09925 oracle is unable to create audit file and the next description “No such file or directory” reveals that there would be probably no audit directory on oracle secondary server.

He checked the audit parameter value from oracle spfile and used that parameter to go to audit directory server has also reported the same error “no such file or directory”. If going to the subdir of that parameter, which is ok, it’s really no audit dir, so have to manually create it.

Then he re-started up the secondary oracle instance ORA-09925 error has disappeared this time but has shown “ORA-00600 [kccsbck_first], [1], [3777804532] …”. Most of us are fearful/scared/afraid to see ora600 becuase we always thought oracle raised the bug due to our behaviour. Now google helped me, it is very clear to point me that the other oracle (primary) instance has already mounted.

Afterwards he went to the primary instance and found that it was really running and then shutdown it and returned to the secondary to re-start up it, took 10 seconds, it has started successfully.

Hope to help you as well and have fun with the weekend!

sec_case_sensitive_logon

Recently I helped one of my colleagues troubleshooting the issue about never log into oracle database 19.15 even if he has inputted the correct username and password. I’ve discovered an interesting blog note from Mike Dietrich on Google. Wow, very great! It’s the same issue as encountered by my colleague.

Now let us simulate the senario from my test environment of oracle 19.3.

SQL> show parameter sec_case_sensitive_logon;

NAME                                 TYPE                      VALUE
------------------------------------ ----------------------    ------------------------------
sec_case_sensitive_logon             boolean                   TRUE

SQL> create user c##test identified by "1QAZ2WSX!";

SQL> grant connect, resource to c##test;

SQL> conn c##test
Enter password:    (no prompt)
connected.
SQL> show user;
USER is "C##TEST"
SQL> conn / as sysdba
SQL> alter system set sec_case_sensitive_logon=false;

SQL> conn c##test
Enter password:   (no prompt, used the correct password)
Error:
ORA-01017: invalid username/password; logon denied

SQL> conn c##test
Enter password:   (no prompt, use all of the lower case character for password)
Error:
ORA-01017: invalid username/password; logon denied
SQL> show parameter sec_case_sensitive_logon;

NAME                                 TYPE                      VALUE
------------------------------------ ----------------------    ------------------------------
sec_case_sensitive_logon             boolean                   FALSE

SQL> alter system set sec_case_sensitive_logon=true;

SQL> show parameter sec_case_sensitive_logon;

NAME                                 TYPE                      VALUE
------------------------------------ ----------------------    ------------------------------
sec_case_sensitive_logon             boolean                   TRUE

SQL> conn c##test
Enter password:    (no prompt)
connected.

Hope to help you as well …

Best wishes for you Mid Autumn Festival in China :-).

Oracle Listener issue

Last month one of my colleagues asked me help him troubleshooting a weird issue for oracle listener whatever on data guard with primary and physical standby at that moment he was deploying oracle data guard. As known as the oracle listener configuration has commonly included three number of parameters, host info (hosname or ip address), port number and service name (offered by static listener).

Continue reading “Oracle Listener issue”

Be careful when silently installing oracle database software by response file

Sometimes we’re not able to call the GUI (Graphical User Interface) to manually install oracle database step by step then it should be better method using response file edited to install it by silent mode.

However, we install oracle single instance or rac environment the first BIG step is to install database or grid software oracle offers the similar response file template for us thus there’re three number of critical steps (aka, keys) have been filled up, I listed them as below.

INVENTORY_LOCATION=
ORACLE_HOME=
ORACLE_BASE=

Usually the values of above keys are like this.

INVENTORY_LOCATION=/u01/app/oraInventory
ORACLE_HOME=/u01/app/oracle/product/19.3.0/db_1
ORACLE_BASE=/u01/app/oracle
Continue reading “Be careful when silently installing oracle database software by response file”

The starting time of rman backup is different from setting time of OS cron job

Recently I helped one of my customers deploying oracle data guard 11gR2 primary and physical standby utilising two numer of database servers that reside on China Telecom Cloud. Yes, the customer purchased cloud Virtual Machine Hosts that are able to fully satisfy his business requirement.

Because he doesn’t pay attention to the TIMEZONE setting after installing Linux OS successfully but the current date is correct on OS, afterwards I certainly installed oracle software and data guard environment.

The next day the business program has been reported the time is always interval 8 hours than now by application developer, the Linux administrator is suddenly aware of the problem of OS timezone. After he adjusted the timezone of OS from “+00:00” to “+08:00” the program looks like to be normal, at that moment it doesn’t seem like to have any time error on oracle database.

After the rman backup task lasts for running several days the actual start running time of rman backup attracted my attention because which is apparently different from the beginning time of OS cron job about rman backup, such as, cron job is starting from 22:30 every day but the rman backup log has shown starting from 14:30 every day. OMG it exists interval 8 hours as well and the issue indicates that it comes from timezone of oracle database then I went to check the current timezone and the current date via SQL*Plus.

Continue reading “The starting time of rman backup is different from setting time of OS cron job”

[INS-06006] Passwordless SSH connectivity not set up between the following node(s) and PRVF-5311

Last week one of my customers encountered a weird issue about INS-06006 and the subsequent PRVF-5311 when she deployed oracle rac 19.3 with two nodes on CentOS 7.6. Actually creating OCR & DATA disks and installing GI & DB softwares on the 1st node is sucessfull and that issue has occurred on executing addnode.sh for GI on the 1st node so that pushing GI software to the 2nd node.

Based on the prompt message from INS-06006 “Passwordless SSH connectivity not set up between …” firstly I thought SSH mutual trust setting between two nodes is probably failed. I went to check the result of SSH mutual trust using “ssh hostname of the other node” (vice versa) whatever on user grid and oracle they’re not any problems on one hand, on the other hand I have also taken a looking at the generated log file after running sshUserSetup.sh that is a best and most simple solution of setting SSH mutual trust on user grid and oracle and the log file is normal.

Next I have to search the keyword INS-06006 on Google the top two number of articles are both from Oracle MoS and “INS-06006 GI RunInstaller Fails If OpenSSH Is Upgraded to 8.x (Doc ID 2555697.1)” attracts my eyes by the way you need an MoS account to be able to download it. As we can see from the topic of that article if OpenSSH is upgraded to 8.x on your Linux server then INS-06006 GI runinstaller will appear. So I continued to check the version of openssh whether it is 8.x or not on her Linux server.

[root@xxxx ~] ssh -V
OpenSSH_8.8p1, OpenSSL 1.0.2k-fips   26 Jan 2017

[root@xxxx ~] rpm -qa | grep openssh
openssh-7.4p1-22.el7_9.x86_64
openssh-clients-7.4p1-22.el7_9.x86_64
openssh-server-7.4p1-22.el7_9.x86_64
Continue reading “[INS-06006] Passwordless SSH connectivity not set up between the following node(s) and PRVF-5311”

OS Audit file could not be created; failing after 6 retries

Recently two of my clients have encountered the same issue on oracle 11gR2, it never logged into the oracle dagtabase server by SQL*Plus CLI (Command Line Interface). The screenshot is as below.

Continue reading “OS Audit file could not be created; failing after 6 retries”

Oracle RAC 19.15 node 2 has never been started

Last week one of my colleague asked me to help him troubleshooting why oracle rac 19.15 node 2 has never been started when the database server of node 2 is rebooting due to system engineer changed a hard disk after the server shut down.

Of course, the reason of occurring this issue is not related to change hard disk. Waiting for a while he reported to me there has frequently shown an error “CRS-42216: No interface are configured on the local node for interface definition bond1(: . *) ?: 172.16.1.0: available interface definitions are [bond1(: . *) ? : [ fe80:0:0:0:0:0:0:0 ]].” on alert log file of grid user. He replied to me that bond1 is the network card of private ip address of oracle rac node 2.

Continue reading “Oracle RAC 19.15 node 2 has never been started”