Aug 15 2011

Book Review: Oracle 11g R1/R2 Real Application Clusters Essentials by Ben Prusinski and Syed Jaffer Hussain (Packt Publishing)

Category: 11g,Databaseittichai @ 9:31 pm

Oracle 11g R1/R2 Real Application Clusters EssentialsOracle Real Application Cluster (RAC)  becomes the technology of choice for highly available and highly scalable deployment of enterprise Oracle database environment. With its innovative technology it comes with no surprise its complexity. The complexity is not only with the underlying database technology itself, but also how well you design and implement it to work with other components including operating system, storage subsystem, etc. In order to setup and maintain a successful RAC environment, it not only requires technical knowledge of database administrator (DBA), but also well collaboration and extended planning between all IT and business partners.

Few years back, there has been a movement of DBA 2.0 which says that, in addition to database knowledge,  an extended understanding of operating system, storage and/or networking is encouraged due to more tightly integrated of Oracle database technologies such as Clusterware, RAC, ASM with the underlying subsystems. I agree with this notion but not to the extent that DBAs should perform all works in those areas. (Yes, I’m aware that some DBAs do it all.) However, I think at least they should have enough understandings of all-related technologies to collaborate with those who are responsible for in order to achieve the desired setup. In many cases especially in the large companies where traditional organization division of the job description of administrators is clear cut and no overlapping, being well-rounded with all-related technologies will allow DBA to better discuss the requirements and designs and even assist with the implementation details.

Most Oracle RAC books in the market today tend to focus only on the detailed technical aspects of this database technology. This mainly is good for those Oracle professionals who at least have worked with RAC before. Few books really provide a good foundation for those who are new to this technology. One of those few which I’ve just read recently is Oracle 11g R1/R2 Real Application Clusters Essentials authored by Ben Prusinski and Syed Jaffer Hussain published by Packt Publishing.

What I really like about this book is the fact that it does not jump into the nuts and bolts of this Oracle technology right away but instead it is trying to build up good foundation starting from concept (especially high availability), architecture (of all components not just of database), design, implementation and then even some of the real-world samples. This stepping-stone approach helps readers to understand what is involved and how they should deploy this technology for the high available database.

In the chapter 1, the emphasis of the high availability concept is crucial. The authors precisely points out that high availability should be looked from users’ perspective. Many DBAs or system administrators especially novice ones have a tendency to focus solely on their respective areas. The provided sample of the availability percentage in the “high availability interpretations” section is a good reminder for a need to have an end-to-end high availability design to avoid a single point of failure (SPOF). The chapter goes on to give the definitions of many acronyms (BIA, BCP, DRP, etc.) commonly-used during the planning phase. It is quite useful to be aware of them especially when discussing with business end-users. The chapter ends with providing a whole picture of the high availability solutions (RAC, Data Guard, Streams, Application Server Clustering) from Oracle as a part of the Maximum Availability Architecture (MAA). It is worth to point out that RAC is just one (not the only) of key components for a complete high availability solution.

Chapter 2 looks into the hardware components of Oracle RAC architecture including network, storage, etc. Again, I think the authors reinforce a need to understand all these underlying technologies (at least at high level) even though DBAs might not involve in the actual implementation. But it is crucial to have this knowledge under their belts in order to have the productive discussions of the requirements and design with system administrators.

Chapter 3 goes into the clusterware installation of 11g R1/R2. Even though readers may to be able to find similar information on many web sites and blogs, this book has some extra information which makes an attempt to explain what each action does, for example, what happen when the orainstRoot.sh and root.sh run?

Chapter 4 dedicates to the Automatic Storage Management (ASM) which becomes the only clear choice for storage management in the 11g deployment. The chapter covers all aspects of ASM in both R1 and R2 including new features, configurations, management, etc.

Chapter 5 covers Clusterware management and (some) troubleshooting samples. Instead of showing all possible troubleshooting scenarios (which is impossible to include everything in a chapter or even in a book), the authors did an excellent job of giving an overview of all necessary Clusterware utilities in order to troubleshoot the issues.

Chapter 6 turns our attention to RAC database administration. The structure is similar to that of Clusterware in the chapter 3. The latter part of the chapter also covers the automatic workload management including the topic of the service which is highly recommended to utilize it (not using the default database service.)

Chapter 7 covers all aspects of backup and recovery of Oracle database including the best practices. In addition, new features of RMAN in Oracle 11g R1 and 11g R2 along with OCR and voting disk backup and recovery are discussed.

Chapter 8 tries to give an overview of performance tuning. The intention here seems to just give an overview of the concept and methods of performance tuning as well as new 11g performance features, for example, Real Application Testing (RAT) which has the Database Replay and SQL Performance Analyzer (SPA). This high-level overview focus is understandable because the scope of performance tuning is generally vast. Some of tuning metrics especially relate to the wait events are explained here.

Chapter 9 focuses on the upgrade scenarios. As we all know it is impossible to cover all possible scenarios. But the authors gives sufficient information about overall upgrade processes from 10g R2 to 11g R1 and then to 11g R2 in this chapter. Definitely consulting with Oracle support for more details on individual upgrade case is still highly recommended.

Chapter 10 is very informative on the node addition and removal. The chapter steam-lines the steps pretty well (comparing to the official Oracle document which is very confusing at the first read with the flow of information). (We just went through adding two nodes into the existing two nodes of Oracle 11g R1 two months ago. That’s why I can say that.)

Chapter 11 shows the readers the importance of the end-to-end high availability implementation. The high availability should not stop at the Oracle RAC database, but it must extend to application layer on top of it. In addition to an overview of Oracle EBS (E-Business Suite), this chapter shows the readers how to take full advantage of high availability and performance features in Oracle RAC database.

The last chapter wraps up nicely with information about other tools in the MAA tool suite namely Streams and Data Guard. It covers all various topics of thes tools including concepts, new features, best practices, and overview configuration.

The last section of this book on the additional resources and tools is a nice read and very informative. It is just like a nice dessert at the end of a good meal.

In conclusion, if you’re new to Oracle RAC technology, this is THE book for you. It should give the readers enough understanding to jump on Oracle RAC bandwagon. Advanced DBAs may find that most of the contents in the book are too familiar, but they might be attracted to the new features, comparisons between R1 and R1 as well as some of real-world samples mentioned in this book. Overall I appreciate a very well effort of Ben ad Syed for one of the most complete informative books on Oracle RAC in the market today. Well-deserved kudos to them.

Thanks to Packt Publishing for the opportunity to review this book.

Tags: , , , , , , , , , , ,


Feb 14 2011

Easy Connect (EZCONNECT) Naming Method with Instance Name

Category: 11g,Network,SQLPlusittichai @ 7:11 pm

Oftentimes I use the Easy Connect (EZCONNECT) to connect to a new database just to quickly verify database connectivity. The syntax I usually use is as follows:

CONNECT username/password@[//]host[:port][/service_name]
CONNECT scott/tiger@host1-vip:1522/RACDB

In RAC, this will connect to whatever instance based on the service’s load balancing goal.

I just learned that with the 11g net services client software, you can also specify the instance name to connect to. This may be useful especially for RAC setup if you have the need to connect to a specific instance.

CONNECT username@[//]host[:port][/[service_name][:server]][/instance_name]]

-- Connect to the instance name RACDB1
CONNECT scott/tiger@host1-vip:1522/RACDB/RACDB1

-- Connect to the instance name RACDB2
CONNECT scott/tiger@host1-vip:1522/RACDB/RACDB2

Also, the optional [:server] is either dedicated or shared.

CONNECT scott/tiger@host1-vip:1522/RACDB:dedicated/RACDB1

Reference: http://download.oracle.com/docs/cd/B28359_01/network.111/b28316/naming.htm#NETAG255

Tags: , , , , , , , , ,


Oct 05 2009

Online SAN Storage Migration for Oracle 11g RAC database with ASM

Category: 11g,Database,Migrationittichai @ 7:30 am

It has been awhile since my last post. My pathetic excuses are all pretty much mentioned here. :-)

Last month we’ve worked with the storage team to migrate the SAN storage of our Oracle 11gR1 database to a new one. The drive of migration is mainly for SAN consolidation which is, of course, ultimately for cost saving. In addition to migrating the ASM disk groups storing database’s data files, all clusterware files (OCR and voting disk) must be migrated too. The rebalance feature in Oracle ASM makes data migration very easy and seamless. And since the clusterware files have redundancy, they can be seamlessly migrated as well. With 11gR1, all migration tasks can be performed online.

Prerequisites/Assumptions:

- New SAN LUNs/disks are already visible to all RAC nodes. In case of the disks for ASM diskgroups, they are already discovered by ASM. The minimum numbers and permissions of the OCR and voting disks must be met.

- It is recommended to perform the migration tasks during off-peak hours or even better if during planned maintenance window period.

Note that the sample shown here is specific to my environment (11.1.0.7 on Solaris 10 with dual-pathing to Hitachi SAN, and OCR and voting disks are on raw devices).

SAN Migration of the ASM diskgroups

If you’re more comfortable with GUI, all tasks here can be accomplished using the Enterprise Manager.

1. Add new disks to ASM diskgroups.

ALTER DISKGROUP PMDW_DG1 ADD DISK
  '/dev/rdsk/c4t60060E80056FB30000006FB300000823d0s6' NAME PMDW_DG1_0003,
  '/dev/rdsk/c4t60060E80056FB30000006FB300000826d0s6' NAME PMDW_DG1_0004,
  '/dev/rdsk/c4t60060E80056FB30000006FB300000829d0s6' NAME PMDW_DG1_0005
REBALANCE POWER 11;

We go with the rebalance power of 11 which is full throttle because it is planned maintenance.

2. Check rebalance status from Enterprise Manager or v$ASM_OPERATION.

3. When rebalance completes, drop the old disks.

ALTER DISKGROUP PMDW_DG1 DROP DISK
   PMDW_DG1_0000,
   PMDW_DG1_0001,
   PMDW_DG1_0002
REBALANCE POWER 11;

When adding or removing several disks, it is recommend to add or remove all disks at once. This is to reduce the number of the rebalance operations that are needed for storage changes.

SAN Migration of the OCR Files

1. Backup all OCR-related files.

# {CRS_HOME}/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     921332
         Used space (kbytes)      :       4548
         Available space (kbytes) :     916784
         ID                       :  776278942
         Device/File Name         : /dev/rdsk/c4t50060E800000000000002892000003F8d0s6
                      Device/File integrity check succeeded
         Device/File Name         : /dev/rdsk/c4t50060E800000000000002892000003F9d0s6
                       Device/File integrity check succeeded

Backup the /var/opt/oracle/ocr.loc file:

# cp ocr.loc ocr.loc.old

Manually backup OCR:

# {CRS_HOME}/bin/ocrconfig -manualbackup

2. As root, run the following commands to replace OCR files. This change can be performed on-line, and will be reflected across the entire cluster.

# {CRS_HOME}/bin/ocrconfig -replace ocr /dev/rdsk/c4t60060E80056FB30000006FB300001014d0s6

# {CRS_HOME}/bin/ocrconfig -replace ocrmirror /dev/rdsk/c4t60060E80056FB30000006FB300001015d0s6

3. Verify the new configuration.

Check new ocr.loc file updated:

# cat /var/opt/oracle/ocr.loc
#Device/file /dev/rdsk/c4t50060E800000000000002892000003F9d0s6 getting replaced by device /dev/rdsk/c4t60060E80056FB30000006FB300001015d0s6
ocrconfig_loc=/dev/rdsk/c4t60060E80056FB30000006FB300001014d0s6
ocrmirrorconfig_loc=/dev/rdsk/c4t60060E80056FB30000006FB300001015d0s6

Check OCR:

# {CRS_HOME}/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     921332
Used space (kbytes)      :       4548
Available space (kbytes) :     916784
ID                       :  776278942
Device/File Name         : /dev/rdsk/c4t60060E80056FB30000006FB300001014d0s6
Device/File integrity check succeeded
Device/File Name         : /dev/rdsk/c4t60060E80056FB30000006FB300001015d0s6
Device/File integrity check succeeded

Cluster registry integrity check succeeded

Logical corruption check succeeded

SAN Migration of the voting disks

1. Backup the voting disks.

Query the original locations:

# /opt/oracrs/bin/crsctl query css votedisk
0.     0    /dev/rdsk/c4t50060E800000000000002892000003FBd0s6
1.     0    /dev/rdsk/c4t50060E800000000000002892000003FCd0s6
2.     0    /dev/rdsk/c4t50060E800000000000002892000003FFd0s6

Backup voting disks using dd:

dd if={voting_disk_name} of={backup_file_name}

Example,
dd if=/dev/rdsk/c4t50060E800000000000002892000003FBd0s6 of=/tmp/voting1

2. Move voting disks.

Starting with 11.1 onwards, the voting disk migration can be performed on-line.

# /opt/oracrs/bin/crsctl delete css votedisk /dev/rdsk/c4t50060E800000000000002892000003FBd0s6
# /opt/oracrs/bin/crsctl add  css votedisk /dev/rdsk/c4t60060E80056FB30000006FB300001017d0s6

# /opt/oracrs/bin/crsctl delete css votedisk  /dev/rdsk/c4t50060E800000000000002892000003FCd0s6
# /opt/oracrs/bin/crsctl add css votedisk  /dev/rdsk/c4t60060E80056FB30000006FB300001018d0s6

# /opt/oracrs/bin/crsctl delete css votedisk  /dev/rdsk/c4t50060E800000000000002892000003FFd0s6
# /opt/oracrs/bin/crsctl add css votedisk  /dev/rdsk/c4t60060E80056FB30000006FB300001019d0s6

3. Verify the new configuration.

# /opt/oracrs/bin/crsctl query css votedisk
0.     0    /dev/rdsk/c4t60060E80056FB30000006FB300001017d0s6
1.     0    /dev/rdsk/c4t60060E80056FB30000006FB300001018d0s6
2.     0    /dev/rdsk/c4t60060E80056FB30000006FB300001019d0s6

Reference:
Metalink #428681.1:  OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices

Tags: , , , , , ,


May 15 2009

RAC Assessment from Oracle

Category: 11g,Databaseittichai @ 7:30 am

In our first large-scale Oracle 11g RAC with ASM deployment, we started this project by engaging Oracle. The primary goal of this engagement is to get as much technical input as possible especially when it comes to the areas of best practices. We’ve been advised by our Oracle account manager to use free RAC assessment service provided by a special unit called Oracle RAC Assurance Team. That was really the first time I’ve heard of this kind of service even though we’ve worked with Oracle since our first Oracle 9i RAC many years back. Please note that this assessment service is not available to all customers as a general offering – business justification is required.

The focus of the review is to validate system components, pertinent to designed functionality and implementation, in order to assess capabilities needed to sustain and support an Oracle RAC environment. The goal is to determine whether the configuration has the potential to cause availability or integrity problems.

The assessment does not imply certifications by Oracle of any hardware and software related to infrastructure.

We’ve been asked to provide the following:

  • Architecture diagrams depicting RAC and application components
  • RDA from each node (Metalink Note 314422.1)
  • Oracle Clusterware information
       $ env
       $ id
       $ cluvfy stage -post crsinst -n all -verbose
       $ crs_stat –t
       $ crsctl query css votedisk
       $ ocrcheck
  • Cluster diagnostic information (using diacollection.pl from Metalink Note 330358.1)
  • System logs
  • Public and Private interconnection definitions
       $ oifcfg getif
       $ oifcfg iflist
  • Interconnection traffic
       select * from gv$configured_interconnects;
  • Service definition
       set pages 60 space 2 num 8 lines 132 verify off feedback off
       column name format a15
       column network_name format a12
       column failover_method format a10
       column failover_type format a10

       select service_id, name, network_name, failover_method,
        failover_type, goal,dtp, enabled, AQ_HA_NOTIFICATIONS, CLB_GOAL
       from dba_services;
  • Opatch information
       opatch lsinventory -oh $RDBMS_HOME
       opatch lsinventory -oh $ASM_HOME
       opatch lsinventory -oh $CRS_HOME
  • System definitions (/etc/system)
  • Layer 3/Network Configuration
       # dladm show-link
  • Adaptor summary statistics for public, VIP and interconnect
       kstat -n <NIC name>
       netstat -in
       ifconfig –a
  • UDP parameter settings
       # ndd /dev/udp udp_xmit_hiwat
       # ndd /dev/udp udp_recv_hiwat
  • ASM instance information
       set linesize 1500
       set pagesize 1000
       column name format a20
       column path format a40
       column failgroup format a20

       select GROUP_NUMBER, NAME, SECTOR_SIZE, BLOCK_SIZE,
         ALLOCATION_UNIT_SIZE, STATE, TYPE, TOTAL_MB, FREE_MB
       from v$asm_diskgroup
         order by GROUP_NUMBER;

       select GROUP_NUMBER, DISK_NUMBER, NAME, MOUNT_STATUS,
         MODE_STATUS, STATE, REDUNDANCY, FAILGROUP, PATH,
         TOTAL_MB, FREE_MB
       from v$asm_disk
       order by GROUP_NUMBER, DISK_NUMBER;
  • AWR/Statspack report for peak load period with
  • OSWatcher or OS statistics (top, mpstat, iostat, vmstat, SAR for same corresponding period as AWR/Statspack data) – See Metalink note 301137.1
  • Alert and others logs

After a few weeks, Oracle came back with recommendations which have been evaluated against best practices compiled from its experience with its global RAC customer base. I found these recommendations very helpful. In fact, in some cases, I would not have known of some of the recommendations without this analysis. I’d like to share with you some recommendations given for our specific environment (Oracle 11.1.0.7 with ASM on Solaris 10). Please keep in mind that this may not apply to your environment. So it is advisable to consult Oracle support and/or the provided Metalink Notes if you have any questions.

These are some best practices and recommendations:

  • Whenever possible use a non-shared Oracle Home for ASM and the RDBMS. In most cases the non-shared Oracle Home is the preferred solution due to the ability to patch in a rolling upgrade fashion with zero downtime and the elimination of the single point of failure and binary dependency issues that the shared Oracle Home approach introduces.
  • Do not install the CRS Oracle Home using a shared file system location.
  • Ensure that run-time files in directories /var/tmp/.oracle, or /tmp/.oracle are not removed while Oracle background, or user processes are active. These directories contains a number of “special” socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even the database instance.
  • Add  slewalways yes and disable pll in /etc/inet/ntp.conf to avoid node reboot due to a leap second event. Refer to Metalink Note: 759143.1.
  • On Solaris, set the UDP buffer settings (udp_xmit_hiwat and udp_recv_hiwat) to 65536. These should improve cluster interconnect throughput. Refer to Metalink Note: 181489.1. Please create an RC script so that it will set udp_xmit_hiwat and udp_recv_hiwat to 65536 everytime the server rebooted.
        ndd -set /dev/udp udp_xmit_hiwat 65536
        ndd -set /dev/udp udp_recv_hiwat 65536
  • Ensure that both OCR and Voting disks are backed up periodically. The OCR backups are taken automatically, and stored on the Master RAC node. Voting disks can be backed up manually using dd with a 4k block size (hot backup).
  • Do not use crs_unregister to remove default CRS resources (nodeapps) unless instructed by Oracle Support.
  • Ensure that the CSS DIAGWAIT parameter is set to 13 seconds on all platforms. DIAGWAIT is a CSS parameter built-into 10.1.0.5+, 10gR2, 11g which will allow enough time for the failed node to flush final trace files to better help debug a node failure. Always keep CSS misscount greater than the setting of DIAGWAIT. Refer to MetaLink Note:559365.1.
  • Increase PARALLEL_EXECUTION_MESSAGE_SIZE from the default (2k), to 4k or 8k in order to provide better recovery slave performance.
  • Set PARALLEL_MIN_SERVERS to CPU_COUNT-1, in order to pre-spawn recovery slaves at startup, instead of when recovery is required.
  • On each RDBMS instance supported by ASM, add the following SHARED_POOL allocations, based on each Disk Group type in use:
  • - For disk groups using external redundancy, every 100GB of space needs 1MB of extra shared pool plus 2MB
    - For disk groups using normal redundancy, every 50GB of space needs 1MB of extra shared pool plus 4MB
    - For disk groups using high redundancy, every 33GB of space needs 1MB of extra shared pool plus 6MB

  • Avoid bad writes to ASM disks by avoiding Raid 5. See NOTE: 30286.1 – I/O Tuning with Different RAID Configurations.
  • Ensure the Network Listener is configured to use an address-list that specifies an IPC connection before the TCP connections. This will avoid fail-over delays associated with TCP timeouts. Refer to Metalink Note: 403743.1.

In conclusion, I’m very impressed with the information and the kind of professionalism Oracle RAC Assessment team has provided. This has definitely given us (as well as management) more confidence with our new setup.

Tags: , ,