在AIX上安装RAC遇到0509-150及0509-022错误

看到某位仁兄在PUB上大倒苦水,讲述在AIX上安装Oracle11.2.0.1RAC时遇到的BUG。庆幸在11.2.0.2安装时没有遇到BUG,不过遇到相当多其他问题,记录总结。

在安装完Clusterware(p10098816_112020_AIX64-5L_3of7)后,执行root.sh脚本报错:

# /oracle/app/11.2.0/grid/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /oracle/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /oracle/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
User grid has the required capabilities to run CSSD in realtime mode
exec(): 0509-036 Cannot load program /oracle/app/11.2.0/grid/bin/ocrconfig.bin because of the following errors:
0509-150 Dependent module libskgxn2.so could not be loaded.
0509-022 Cannot load module libskgxn2.so.
0509-026 System error: A file or directory in the path name does not exist.
Failed to create or upgrade OLR
Failed to create or upgrade OLR at /oracle/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6744.
/oracle/app/11.2.0/grid/perl/bin/perl -I/oracle/app/11.2.0/grid/perl/lib -I/oracle/app/11.2.0/grid/crs/install /oracle/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

脚本执行失败,报如下错误:

0509-150 Dependent module libskgxn2.so could not be loaded.
0509-022 Cannot load module libskgxn2.so.
0509-026 System error: A file or directory in the path name does not exist.
查询metalink,文档CANNOT ADD NODE DETAILS IN “SPECIFY CLUSTER CONFIGURATION” OUI SCREEN [ID 754906.1]中有对应说明:

导致的原因为安装了HACMP。

由于在安装前已经与咨询系统工程师进行了沟通,要求不要装HACMP,所以没有预料到会出现这样的问题。再次咨询,SA表示他装了,但是没有配置。这就尴尬了!如果不装,那安装Oracle不会报错。如果装了,并且配置好了,可能也不会报错。恰恰是安装了没配置导致的。可以从rootpre.sh的log中发现提示:

Checking if group services should be configured....
ODMDIR=/etc/objrepos, isHACMP= 14
CMD: /bin/chmod +x /usr/sbin/cluster/utilities/cldomain
CMD: /usr/sbin/lsgroup hagsuser
Creating required group for group services: hagsuser
Please add your Oracle userid to the group: hagsuser
CMD: /bin/mkgroup -A hagsuser
Configuring HACMP group services socket for possible use by Oracle.
/var/ha/soc/grpsvcsdsocket.No HACMPcluster class found: No such file or directory.
Please make sure that the group services subsystem is active.
Aborting pre-installation procedure. Installations of Oracle may fail.

isHACMP= 14,表示Oracle已经检测到HACMP的安装。由于我们也疏忽,导致没有在问题在第一次被检测出的时候及时发现,耽误了1个小时有余的时间。

解决方案:

Oracle给出的解决方案是

Symptom I :-

Install the missing fileset ie cluster.es.clvm.rte lpp on all the nodes.

Run the below command to check the fileset is installed

lslpp -l cluster.es.clvm.rte

and re-run the rootpre.sh to check the HACMP is detecting

Oracle provides Cluster Verification Utility (CVU) to perform system checks in preparation for installation, patch updates, or other system changes.

This will ensure that you have completed the required system configuration and preinstallation steps so that your installation, update, or patch operation completes successfully

Run the cluvfy to check all prerequisites are configured properly

./runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose

Make sure all OS patchsets are installed for HACMP to support Oracle Clusterware.

Please go through the below Note for Prerequisites and ceritification details for Installing Oracle clusterware on IBM AIX systems.

Note 282036.1 Minimum Software Versions and Patches Required to Support Oracle Products on IBM pSeries

For HACMP 5.4,you need to download the below patch before Installing the CRS.

i) Download the Patch 6718715

Steps to apply the patch
————————
1–> Login as root user
2–> Unpack the files shipped in this patch in a temporary directory
3–> Run the rootpre.sh script
./rootpre.sh

我们则选择了另一个更加直接的方案,删除HACMP,重装clusterware。删除后问题解决。

总结:在安装前,一定要与SA沟通明确,哪些包要安装,哪些绝对不能安;如何划分存储(有可能跟存储工程师讨论)等。SA配置完后DBA一定要检验一遍,以免人为或是误操作导致问题。当然,DBA也要认真读log,把问题解决在第一遍提示之后,而不是等报错再去解决。

普人特福的博客cnzz&51la for wordpress,cnzz for wordpress,51la for wordpress