TITLE    : Heartbeat fail-over of IP addresses
OS LEVEL : Linux (SLES9)
DATE     : 19/05/2005
VERSION  : 1.0
AUTHOR   : Hubertus A. Haniel (hubba@unixcook.com)
-------------------------------------------------------------------------------

This is a quick guide on how setup IP fail-over between two systems using the
heartbeat subsystem for High-Availability Linux.

As a first step ensure that the heartbeat package is installed. Only the basic 
heartbeat package is required, the yast2-heartbeat package can be installed to 
be able to set this configuration up through yast. This document however will
only cover the manual setup.

The files that need to be modified or created all live in /etc/ha.d, examples
of the files to be created can be found in /usr/share/doc/packages/heartbeat.

In the following example we will refer to system-a and system-b with the ip
addresses of 192.168.0.1 and 192.168.0.2. Our service address we want to fail
over will be 192.168.0.3

The files to be created are as follows:


Defining the basic cluster configuration with the ha.cf file.
-------------------------------------------------------------

/etc/ha.d/ha.cf on system-a:
debugfile /var/log/ha-debug  # Debug information gets logged here
logfile /var/log/ha-log      # Normal informational messages are here.
logfacility local0           # Also log to syslog using local0
keepalive 2                  # Send heartbeats every 2 seconds
deadtime 30                  # Declare node dead after 30 seconds
warntime 10                  # Issue a late heartbeat warning after 10 seconds
initdead 120                 # Deadtime after boot (allow for things to come up)
ucast eth0 192.168.0.2       # Heartbeat to the other node
auto_failback on             # Fail IP back resources automatically
node system-a                # Define the nodes in the cluster.
node system-b

/etc/ha.d/ha.cf on system-b:
debugfile /var/log/ha-debug  # Debug information gets logged here
logfile /var/log/ha-log      # Normal informational messages are here.
logfacility local0           # Also log to syslog using local0
keepalive 2                  # Send heartbeats every 2 seconds
deadtime 30                  # Declare node dead after 30 seconds
warntime 10                  # Issue a late heartbeat warning after 10 seconds
initdead 120                 # Deadtime after boot (allow for things to come up)
ucast eth0 192.168.0.1       # Heartbeat to the other node
auto_failback on             # Fail IP back resources automatically
node system-a                # Define the nodes in the cluster.
node system-b


Defining the authentication between the nodes and their password.
-----------------------------------------------------------------
On both systems the /etc/ha.d/authkeys file must exist with root ownership and
0600 permissions.  This file defines the authentication used between the HA 
daemons on the nodes. The file should look something like this:
auth 1            # Send this authentication method to remote nodes
1 sha1 password   # Define what the method, encryption and password


Defining the resources
----------------------
The /etc/ha.d/haresources is used to define the actual resources in the cluster.
This file should be the same on each node in the cluster and has the format of:

<primary node of resource> <ip address of resource> <init script for resource>

For our purposes the following line would bring up eth0:0 with the service
address:
system-a 192.168.0.3/24/eth0/192.168.0.255


Considerations to take on the above setup
-----------------------------------------
- In an ideal setup one should have multiple separate heartbeat links to avoid
  node isolation.
- One should investigate to see if it is possible to combine the bonding driver
  with this setup to provide more redundancy on each node before failing the IP.