Database failover

A clustered database, such as Oracle RAC, can couple one or more separate database servers into one database instance to gain resilience and scalability. A KCML process can connect to any of the nodes in the database using KI_CONNECT and nominate another secondary node to be used in the event of a failure of the primary server.

If KCML detects that the database server has failed it can switch to the secondary server and carry on as KCML has full knowledge of the application state. The application and the user will be unaware of the failure.

If the program is executing inside a transaction then it will restart the WHILE TRANS at the top. The old transaction will not have been committed on the failed server so this is safe provided the application is written to allow WHILE TRANS blocks to be restarted.

If outside a transaction a failed random read can just be repeated but a failed sequential read will need to reissue the START and skip forward to the same position in the result set. While this is done transparently, it may take some time.

If the program was executing any sort of complex DDL statement that involves several committing SQL statements then it may error and produce a PANIC.

Oracle errors treated as failover

The following Oracle error codes are treated as a cause of failover. If other error Oracle errors need to be added to this list they can be done at run-time. Use KI_CONNECT_GET_ATTR_STR to fetch the existing list as a comma separated list of decimal numbers, add new errors and set using KI_CONNECT_SET_ATTR_STR, using the constant _KDB_CONN_FAILOVER_ERRORS. The list is automatically sorted and de-duplicated.

1012not logged on to Oracle
1033Oracle initialization or shutdown in progress
1034ORACLE not available
1089immediate shutdown in progress, no operations are permitted
1092ORACLE instance terminated. Disconnection forced (since 07.18.01.24198)
3113end-of-file on communication channel
3114not connected to ORACLE
3135connection lost contact
12203TNS: unable to connect to destination
12500TNS: listener failed to start a dedicated server process
12514TNS: listener could not resolve SERVICE_NAME given in connect descriptor
12528TNS: listener: all appropriate instances are blocking new connection (since 07.18.01.24198)
12541TNS: no listener
12543TNS: destination host unreachable (since 07.18.01.24198)
12560TNS: protocol adaptor error
12571TNS: packet writer failure
65535failed to get Oracle error