RedHat Cluster howto
Introduction
Here I wrote up a little tutorial how to configure a standard RHEL cluster. Configuring a RHEL cluster is quite easy but documentation is sparse and not well organized. We will configure a 4 nodes cluster with shared storage and Heatbeat over a different NIC (not the main data link).
Cluster configuration goals
- Shared storage
- HA-LVM: lvm failover configuration (like HP ServiceGuard) is different from clustered logical volume manager (clvm)!!
- Bonded main data link (eg. bond0 –> eth0 + eth1)
- Hearthbeat on a different data link (eg. eth2)
Cluster installation steps
OS installation
First we performed a full CentOS 5.5 installation using kickstart, we also installed cluster packages like:
- cman
- rgmanager
- qdiskd
- ccs_tools
or
- @clustering (kickstart group)
Networking configuration
We configure 2 different data link:
- Main data link (for applications)
- Heartbeat data link (for cluster communication)
Main data link (bond0) uses ethernet bonding over 2 phisycal eth (eth0, eth1). This configuration assures network high availability when some network paths fail.
Cluster communication (heartbeat) uses a dedicated ethernet link (eth2), configured in a diffentent network and vlan.
To obtain such configuration cerate this file /etc/sysconfig/network- scripts/ifcfg-bond0 from scratch and fill it as below:
DEVICE=bond0
IPADDR=<your server main IP address (eg. 10.200.56.41)>
NETMASK=<your server main network mask (eg. 255.255.255.0)>
NETWORK=<your server main network (eg. 10.200.56.0)>
BROADCAST=<your server main network broadcast (eg. 10.200.56.255)>
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_OPTS='miimon=100 mode=1'
GATEWAY=<your server main default gateway (eg. 10.200.56.1)>
TYPE=Ethernet
IPADDR=<your server main IP address (eg. 10.200.56.41)>
NETMASK=<your server main network mask (eg. 255.255.255.0)>
NETWORK=<your server main network (eg. 10.200.56.0)>
BROADCAST=<your server main network broadcast (eg. 10.200.56.255)>
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_OPTS='miimon=100 mode=1'
GATEWAY=<your server main default gateway (eg. 10.200.56.1)>
TYPE=Ethernet
You can customize BONDING_OPT. Please see bonding documentation.
Modify /etc/sysconfig/network- scripts/ifcfg-eth{0,1}:
DEVICE=<eth0 or eth1, etc...>
USECTL=no
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
HWADDR=<your eth MAC address (eg. 00:23:7d:3c:18:40)>
ONBOOT=yes
TYPE=Ethernet
USECTL=no
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
HWADDR=<your eth MAC address (eg. 00:23:7d:3c:18:40)>
ONBOOT=yes
TYPE=Ethernet
Modify heartbeat nic /etc/sysconfig/network- scripts/ifcfg-eth2:
DEVICE=eth2
HWADDR=<your eth MAC address (eg. 00:23:7D:3C:CE:96)>
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NETMASK=<your server heartbeat network mask (eg. 255.255.255.0)>
IPADDR=<your server main IP address (eg. 192.168.133.41)>
HWADDR=<your eth MAC address (eg. 00:23:7D:3C:CE:96)>
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NETMASK=<your server heartbeat network mask (eg. 255.255.255.0)>
IPADDR=<your server main IP address (eg. 192.168.133.41)>
Note that heartbeat eth2 has no default gateway configured. Normally this is not required unless this node is outside other node’s network and there are not specific static routes.
Add this line to /etc/modprobe.conf:
alias bond0 bonding
Add to /etc/hosts the informations about each cluster node and replicate the file among the nodes:
# These are example!!!
10.200.56.41 artu.yourdomain.com artu
192.168.133.41 h-artu.yourdomain.com h-artu
10.200.56.42 ginevra.yourdomain. com ginevra
192.168.133.42 h-ginevra.yourdomain. com h-ginevra
10.200.56.43 morgana.yourdomain. com morgana
192.168.133.43 h-morgana.yourdomain. com h-morgana
10.200.56.44 lancelot. yourdomain.com lancelot
192.168.133.44 h-lancelot. yourdomain.com h-lancelot
10.200.56.41 artu.yourdomain.com artu
192.168.133.41 h-artu.yourdomain.com h-artu
10.200.56.42 ginevra.yourdomain.
192.168.133.42 h-ginevra.yourdomain.
10.200.56.43 morgana.yourdomain.
192.168.133.43 h-morgana.yourdomain.
10.200.56.44 lancelot.
192.168.133.44 h-lancelot.
Logical Volume Manager configuration
We choose not to use clustered logical volume manager (clvmd, sometimes called LVMFailover) but to use HA-LVM instead. HA-LVM is totally different from clvmd and it is quite similar di HP ServiceGuard behaviour.
HA-LVM features
- No needs to run any daemon (like clvmd aka LVMFailover)
- Each volume group can be activated exclusively on one node at a time
- Volume group configuration is not replicated automatically among the nodes (need to run vgscan on the nodes)
- Implementation not dipendent of the cluster status (can work without cluster running at all)
HA-LVM howto
Configure /etc/lvm/lvm.conf as below:
Substitute existing filter with:
filter = [ "a/dev/mpath/.*/", "a/c[0-9]d[0-9]p[0-9]$/", "a/sd*/", "r/.*/" ]
check locking_type:
locking_type = 1
substitute existing volume_list with:
volume_list = [ "vg00", "<quorum disk volume group>", "@<hostname related to heartbeat nic>" ]
Where:
- vg00 is the name of the root volume group (always active)
- <quorum disk volume group> is the name of the quorum disk volume group (always active)
- @<hostname related to heartbeat nic> is a tag. Each volume group can have one tag at a time. Cluster lvm agents tag the volume groups with the hostname (present into configuration) in order to activate them. LVM activate only volume groups that contain such tag. In this way each volume group tagged can be activated and accessed by one node at a time (because of volume_list settings)
At the end remember to regenerate initrd!
# mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r)
Storage configuration
Depending of your storage system, you should configure multipath, and each should be able to access to the same luns.
Quorum disk
Quorum disk is a 20MB LUN shared on the storage to all cluster nodes. This disk is used by the cluster to tie-break in case of split-brain events. Each node update its own information to the quorum disk. If some nodes experience network problems, the quorum disk assures that only the right group of nodes form the cluster but not both (split-brain)!
Quorum disk creation
First be sure that each node can see the same 20MB LUN. Then, on the first node, create a physical volume:
# pvcreate /dev/mpath1
create a dedicated volume group:
# vgcreate -s 8 vg_qdisk /dev/mpath1
create a logical volume and extend it to maximun volume group size:
# lvcreate -l <max_vg_pe> -n lv_qdisk vg_qdisk
Make sure that this volume group is present into volume_list inside /etc/lvm/lvm.conf. It should be activated on all nodes!
On the other nodes perform a:
# vgscan
Should appear the quorum disk volume group.
Quorum disk configuration
Now we have to populate quorum disk space with the right information. To perform this type:
# mkqdisk -c /dev/vg_qdisk/lv_qdisk -l <your_cluster_name>
Note that is not required to use your cluster name as quorum disk label, but it is recommended.
You need also to create a heuristic script to help qdisk when acting as tie-breaker. Create /usr/share/cluster/check_eth_ link.sh:
#!/bin/sh
# Network link status checker
ethtool $1 | grep -q "Link detected.*yes"
exit $?
# Network link status checker
ethtool $1 | grep -q "Link detected.*yes"
exit $?
Now activate the quorum disk:
# service qdiskd start
# chkconfig qdiskd on
# chkconfig qdiskd on
Logging configuration
In order to assure a good logging you can choose to log the rgmanager to a specific file.
Add this lines to /etc/syslog.conf:
# Red Hat Cluster
local4.* /var/log/rgmanager
local4.* /var/log/rgmanager
Add /var/log/rgmanager to logrotate syslog settings in /etc/logrotate.d/syslog:
/var/log/messages /var/log/secure /var/log/maillog /var/log/spooler /var/log/boot.log /var/log/cron /var/log/rgmanager {
sharedscripts
postrotate
/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true
/bin/kill -HUP `cat /var/run/rsyslogd.pid 2> /dev/null` 2> /dev/null || true
endscript
}
sharedscripts
postrotate
/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true
/bin/kill -HUP `cat /var/run/rsyslogd.pid 2> /dev/null` 2> /dev/null || true
endscript
}
Modify this line in /etc/cluster/cluster.conf:
<rm log_facility="local4" log_level="5">
Increment /etc/cluster/cluster.conf version and update on all nodes:
# ccs_tool update /etc/cluster/cluster.conf
Cluster configuration
For configuring cluster you can choose to use:
- Luci web interface
- Manual xml configuration
Configuring cluster using luci
In order to use luci web interface you need to activate service ricci on all nodes and luci on one node only:
(on all nodes)
# chkconfig ricci on
# service ricci start
# chkconfig ricci on
# service ricci start
(choose only a node)
# chkconfig luci on
# luci_admin init
# service luci restart
# chkconfig luci on
# luci_admin init
# service luci restart
Please note that luci_admin init must be executed only the first time and before starting luci service, otherwise luci will be unusable.
now connect to luci: https://node_with_luci. mydomain.com:8084 Here you can create a cluster, add nodes, create services, failover domains etc…
Configuring cluster editing the XML
You can also manually configure a cluster editing its main config file /etc/cluster/cluster.conf. To create the config skeleton use:
# ccs_tool create
now the just created config file is not yet usable, you should configure cluster settings, add nodes, create services, failover domains etc…
When config file is complete, copy the file on all nodes and start the cluster in this way:
(on all nodes)
# chkconfig cman on
# chkconfig rgmanager on
# service cman start
# service rgmanager start
# chkconfig cman on
# chkconfig rgmanager on
# service cman start
# service rgmanager start
Recommended cluster configuration
Here is attached a /etc/cluster/cluster.conf file of a fully configured cluster.
For commenting purposes, the file is splitted into several consecutive parts:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<?xml version="1.0"?>
<cluster alias="jcaps_prd" config_version="26" name="jcaps_prd"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="h-lancelot.yourdomain. <fence/ </clusternode> <clusternode name="h-artu.yourdomain.com" nodeid="2" votes="1"> <fence/ </clusternode> <clusternode name="h-morgana.yourdomain.com <fence/ </clusternode> </clusternodes> <cman expected_votes="4"/> <fencedevices/> |
This is the first part of the XML cluster config file.
- First line describes the cluster name and the config_version. Each time you modify the XML you must increment the config_version by 1 prior to update the config on all nodes.
- Fence deamon line is the default one.
- Cluster node stanza contains the nodes of the cluster. Note that name property contains the FQDN of the name. This name determines the eth used for cluster communication. In this example we don’t use the main hostname but the hostname related to the eth we choose to use as cluster communication channel.
- Note also that the line <fence/> is required. Note that here we do not use any fence device. Due to the nature of HA-LVM the access to the data sould be exclusive by one node at a time.
- Cman expected_votes is 4 because each node give 1 vote each.1
2
3
4
5
6
7
8
9<rm log_facility="local4" log_level="5">
<failoverdomains>
<failoverdomain name="jcaps_prd" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="h-lancelot.yourdomain. com" priority="1"/>
<failoverdomainnode name="h-artu.yourdomain.com" priority="1"/>
<failoverdomainnode name="h-morgana.yourdomain.com " priority="1"/>
</failoverdomain>
</failoverdomains>
<resources/>This section begins resource manager configuration (<rm ...>).- Resource manager section can be configured for logging. Rm logs to syslog, here we configured the log_facility and the logging level. The facility we specified allows us to log to a separate file (see logging configuration)
- We configured also a failover domain containing all cluster node. We want that a service can switch to all cluster nodes, but you can also configure different behaviours here.1
2
3
4
5
6
7
8
9<service autostart="1" domain="jcaps_prd" exclusive="0" name="subversion" recovery="relocate">
<ip address="10.200.56.60" monitor_link="1"/>
<lvm name="vg_subversion_apps" vg_name="vg_subversion_apps"/>
<lvm name="vg_subversion_data" vg_name="vg_subversion_data"/>
<fs device="/dev/vg_subversion_apps/lv_apps" force_fsck="1" force_unmount="1" fsid