Configure NFS Server Clustering with Pacemaker on CentOS 7 / RHEL 7

NFS (Network File System) is the most widely server to provide files over network. With NFS server we can share folders over the network and allowed clients or system can access those shared folders and can use them in their applications. When it comes to the production environment then we should configure nfs server in high availability to rule out the single point of failure.

In this article we will discuss how we can configure nfs server high availability clustering(active-passive) with pacemaker on CentOS 7 or RHEL 7

Following are my lab details that I have used for this article,

NFS Server 1 (nfs1.example.com) – 192.168.1.40 – Minimal CentOS 7 / RHEL 7
NFS Server 2 (nfs2.example.com) – 192.168.1.50 – Minimal CentOS 7 / RHEL 7
NFS Server VIP – 192.168.1.51
Firewall enabled
SELinux enabled

Refer the below steps to configure NFS Server active-passive clustering on CentOS 7 / RHEL 7

Step 1) Set Host name on both nfs servers and update /etc/hosts file

Login to both nfs servers and set the hostname as “nfs1.example.com” and “nfs2.example.com” respectively using hostnamectl command, Example is shown below

~]# hostnamectl set-hostname "nfs1.example.com"
~]# exec bash

Update the /etc/hosts file on both nfs servers,

192.168.1.40  nfs1.example.com
192.168.1.50  nfs2.example.com

Step 2) Update both nfs servers and install pcs packages

Use below ‘yum update’ command to apply all the updates on both nfs servers and then reboot once.

~]# yum update && reboot

Install pcs and fence-agent packages on both nfs servers,

[root@nfs1 ~]# yum install -y pcs fence-agents-all
[root@nfs2 ~]# yum install -y pcs fence-agents-all

Once the pcs and fencing agents’s packages are installed then allow pcs related ports in OS firewall from both the nfs servers,

~]# firewall-cmd --permanent --add-service=high-availability
~]# firewall-cmd --reload

Now Start and enable pcsd service on both nfs nodes using beneath commands,

~]# systemctl enable pcsd
~]# systemctl start  pcsd

Step 3) Authenticate nfs nodes and form a cluster

Set the password to hacluster user, pcsd service will use this user to get the cluster nodes authenticated, so let’s first set the password to hacluster user on both the nodes,

[root@nfs1 ~]# echo "enter_password" | passwd --stdin hacluster
[root@nfs2 ~]# echo "enter_password" | passwd --stdin hacluster

Now authenticate the Cluster nodes, In our case nfs2.example.com will be authenticated on nfs1.example.com, run the below pcs cluster command on “nfs1”

[root@nfs1 ~]# pcs cluster auth nfs1.example.com nfs2.example.com
Username: hacluster
Password:
nfs1.example.com: Authorized
nfs2.example.com: Authorized
[root@nfs1 ~]#

Now its time to form a cluster with the name “nfs_cluster” and add both nfs nodes to it. Run below “pcs cluster setup” command from any nfs node,

[root@nfs1 ~]# pcs cluster setup --start --name nfs_cluster nfs1.example.com \
 nfs2.example.com

Enable pcs cluster service on both the nodes so that nodes will join the cluster automatically after reboot. Execute below command from either of nfs node,

[root@nfs1 ~]# pcs cluster enable --all
nfs1.example.com: Cluster Enabled
nfs2.example.com: Cluster Enabled
[root@nfs1 ~]#

Step 4) Define Fencing device for each cluster node

Fencing is the most important part of a cluster, if any of the node goes faulty then fencing device will remove that node from the cluster. In Pacemaker fencing is defined using Stonith (Shoot The Other Node In The Head) resource.

In this tutorial we are using a shared disk of size 1 GB (/dev/sdc) as a fencing device. Let’s first find out the id of /dev/sdc disk

[root@nfs1 ~]# ls -l /dev/disk/by-id/

Note down the id of disk /dev/sdc as we will it in “pcs stonith” command.

Now run below “pcs stonith” command from either of the node to create fencing device(disk_fencing)

[root@nfs1 ~]# pcs stonith create disk_fencing fence_scsi \ 
pcmk_host_list="nfs1.example.com nfs2.example.com" \ 
pcmk_monitor_action="metadata" pcmk_reboot_action="off" \ 
devices="/dev/disk/by-id/wwn-0x6001405e49919dad5824dc2af5fb3ca0" \ 
meta provides="unfencing"
[root@nfs1 ~]#

Verify the status of stonith using below command,

[root@nfs1 ~]# pcs stonith show
 disk_fencing   (stonith:fence_scsi):   Started nfs1.example.com
[root@nfs1 ~]#

Run “pcs status” command to view status of cluster

[root@nfs1 ~]# pcs status
Cluster name: nfs_cluster
Stack: corosync
Current DC: nfs2.example.com (version 1.1.16-12.el7_4.7-94ff4df) \
 - partition with quorum
Last updated: Sun Mar  4 03:18:47 2018
Last change: Sun Mar  4 03:16:09 2018 by root via cibadmin on nfs1.example.com

2 nodes configured
1 resource configured
Online: [ nfs1.example.com nfs2.example.com ]
Full list of resources:
 disk_fencing   (stonith:fence_scsi):   Started nfs1.example.com
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@nfs1 ~]#

Note: If your cluster nodes are the Virtual machines and hosted on VMware then you can use “fence_vmware_soap” fencing agent. To configure “fence_vmware_soap” as fencing agent, refer the below logical steps:

1) Verify whether your cluster nodes can reach to VMware hypervisor or Vcenter

# fence_vmware_soap -a <vCenter_IP_address> -l <user_name> -p <password> \
 --ssl -z -v -o list |egrep "(nfs1.example.com|nfs2.example.com)"
or
# fence_vmware_soap -a <vCenter_IP_address> -l <user_name> -p <password> \ 
--ssl -z -o list |egrep "(nfs1.example.com|nfs2.example.com)"

if you are able to see the VM names in the output then it is fine, otherwise you need to check why cluster nodes not able to make connection esxi or vcenter.

2) Define the fencing device using below command,

# pcs stonith create vmware_fence fence_vmware_soap \ 
pcmk_host_map="node1:nfs1.example.com;node2:nfs2.example.com" \ 
ipaddr=<vCenter_IP_address> ssl=1 login=<user_name> passwd=<password>

3) check the stonith status using below command,

# pcs stonith show

Step 5) Install nfs and format nfs shared disk

Install ‘nfs-utils’ package on both nfs servers

[root@nfs1 ~]# yum install nfs-utils -y
[root@nfs2 ~]# yum install nfs-utils -y

Stop and disable local “nfs-lock” service on both nodes as this service will be controlled by pacemaker

[root@nfs1 ~]# systemctl stop nfs-lock &&  systemctl disable nfs-lock
[root@nfs2 ~]# systemctl stop nfs-lock &&  systemctl disable nfs-lock

Let’s assume we have a shared disk “/dev/sdb” of size 10 GB between two cluster nodes, Create partition on it and format it as xfs file system

[root@nfs1 ~]# fdisk /dev/sdb

Run the partprobe command on both nodes and reboot once.

~]# partprobe

Now format “/dev/sdb1” as xfs file system

[root@nfs1 ~]# mkfs.xfs /dev/sdb1
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=655296 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2621184, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@nfs1 ~]#

Create mount point for this file system on both the nodes,

[root@nfs1 ~]# mkdir /nfsshare
[root@nfs2 ~]# mkdir /nfsshare

Step 6) Configure all required NFS resources on Cluster Nodes

Followings are the required NFS resources:

Filesystem resource
nfsserver resource
exportfs resource
IPaddr2 floating IP address resource

For Filesystem resource, we need a shared storage among the cluster nodes, we have already created partition on the shared disk (/dev/sdb1) in above steps, so we will use that partition. Use below “pcs resource create” command to define Filesystem resource from any of the node,

[root@nfs1 ~]# pcs resource create nfsshare Filesystem device=/dev/sdb1 \
  directory=/nfsshare fstype=xfs --group nfsgrp
[root@nfs1 ~]#

In above command we have defined NFS filesystem as “nfsshare” under the group “nfsgrp“. Now onwards all nfs resources will created under the group nfsgrp.

Create nfsserver resource with name ‘nfsd‘ using the below command,

[root@nfs1 ~]# pcs resource create nfsd nfsserver \ 
nfs_shared_infodir=/nfsshare/nfsinfo --group nfsgrp
[root@nfs1 ~]#

Create exportfs resource with the name “nfsroot”

[root@nfs1 ~]#  pcs resource create nfsroot exportfs clientspec="192.168.1.0/24" options=rw,sync,no_root_squash directory=/nfsshare fsid=0 --group nfsgrp
[root@nfs1 ~]#

In the above command, clientspec indicates the allowed clients which can access the nfsshare

Create NFS IPaddr2 resource using below command,

[root@nfs1 ~]# pcs resource create nfsip IPaddr2 ip=192.168.1.51 \ 
cidr_netmask=24 --group nfsgrp
[root@nfs1 ~]#

Now view and verify the cluster using pcs status

[root@nfs1 ~]# pcs status

Once you are done with NFS resources then allow nfs server ports in OS firewall from both nfs servers,

~]# firewall-cmd --permanent --add-service=nfs
~]#  firewall-cmd --permanent --add-service=mountd
~]#  firewall-cmd --permanent --add-service=rpc-bind
~]#  firewall-cmd --reload

Step 7) Try Mounting NFS share on Clients

Now try mounting the nfs share using mount command, example is shown below

[root@localhost ~]# mkdir /mnt/nfsshare
[root@localhost ~]# mount 192.168.1.51:/ /mnt/nfsshare/
[root@localhost ~]# df -Th /mnt/nfsshare
Filesystem     Type  Size  Used Avail Use% Mounted on
192.168.1.51:/ nfs4   10G   32M   10G   1% /mnt/nfsshare
[root@localhost ~]#
[root@localhost ~]# cd /mnt/nfsshare/
[root@localhost nfsshare]# ls
nfsinfo
[root@localhost nfsshare]#

For Cluster testing, stop the cluster service on any of the node and see whether nfsshare is accessible or not. Let’s assume I am going stop cluster service on “nfs1.example.com”

[root@nfs1 ~]# pcs cluster stop
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...
[root@nfs1 ~]#

Now go to client machine and see whether nfsshare is still accessible, In my case I am still able to access it and able to create files on it.

[root@localhost nfsshare]# touch test
[root@localhost nfsshare]#

Now enable the cluster service on “nfs1.example.com” using below command,

[root@nfs1 ~]# pcs cluster start
Starting Cluster...
[root@nfs1 ~]#

That’s all from this article, it confirms that we have successfully configured NFS active-passive clustering using pacemaker. Please do share your feedback and comments in the comments section below.

24 thoughts on “Configure NFS Server Clustering with Pacemaker on CentOS 7 / RHEL 7”

Tomas

March 7, 2018 at 2:58 pm

Amazing article! I haven’t had a chance to try it yet, but does it work with NFSv4?
- Pradeep Kumar
  
  March 8, 2018 at 12:56 am
  
  Yes Tomas, It will work with NFSv4
  - Tomas
    
    March 9, 2018 at 2:30 pm
    
    Thanks to this article, I’ve managed to get the whole NFSv4 pacemaker cluster deployed via Puppet.
    
    One small thing, you did group the resources together, but you didn’t set any order. I had a problem where the nfsroot resource failed because the nfsshare was not yet available. I’ve configured ordering constraints to resolve it.
    - noni
      
      December 1, 2019 at 3:01 pm
      
      Hi Tomas
      
      How exactly did you do that
Antonius

March 8, 2018 at 7:15 am

Hi,

Is this type of fence devices works if i using another scenario such as app cluster?
- Tomas
  
  March 9, 2018 at 10:37 am
  
  It does work regardless of the service that is clustered.
Tomas

March 21, 2018 at 12:09 am

I just noticed that you use a shared disk on VirtualBox.

What happens when you actually try to fence a node manually? For example:

# pcs stonith fence nfs2

I cannot see anything here showing that you tested it, and that it worked. I don’t use VirtualBox therefore genuinely curious.
- Pradeep Kumar
  
  March 31, 2018 at 2:10 am
  
  Hi Tomas,
  
  ‘pcs stonith fence’ command should fence the mentioned node
Daniel Cordero

May 8, 2018 at 5:23 pm

Hello.

How can I make this procedure plus cifs shares?

Greetings.
Jérôme

October 8, 2018 at 3:42 pm

Great article ! Exactly my need.
Just a question regarding resource configuration how much memory and cpu would you dedicate for each cluster node, for almost 50 users using a shared disk of almost 500GB?
NILESH ALHAT

May 3, 2019 at 8:17 pm

not able to mount any suggestions?
Paul.LKW

May 15, 2019 at 3:24 am

I just added a disk as /dev/sdb, but when I issue the command ls -l /dev/disk/by-id I could not see what wwn-0x6001405e49919dad5824dc2af5fb3ca0 related to sdb, so I could not configure by your hints any any further !!! Any hints could provide ?
mous

May 29, 2019 at 1:54 pm

i have question : if i have one lun that is HA NFS on 3 clusters nodes , x ,y and z . if we suppose that cluster is mounted on server x , can we add nfs mount point on servers y z using the VIP ?
so the lun will be direct mounted to x and NFS mounted to Y and Z .
if yes ? then if server x went down will the lun will be mounted on Y or Z even if the NFS mount exist ?
- Pradeep Kumar
  
  May 30, 2019 at 1:00 am
  
  Using Pacemaker we usually configure Active-Passive NFS cluster, All the services including VIP and NFS LUN will be available on active Node, let’s say x node, if due to some reasons this node went down then all services ( including NFS LUN and VIP) will be migrated to either y or z node.
Paul.LKW

May 30, 2019 at 3:16 am

I find also by this one “pcs resource create nfsshare Filesystem device=/dev/sdb1 directory=/nfsshare fstype=xfs –group nfsgrp”.
Because modern Linux will interchange the device name so I got /dev/sdb1 and /dev/sdc1 interchanged on reboot at some time and I could not find pcs create will accept UUID so once the NFS node is down whether it could resume is an unknown factor by this method !!!
kaushik sen

July 12, 2019 at 6:16 am

perfect article for FC San storage and two physical Node Nfs cluster ,
i`ve completed one of my project with the help of this article

thank you 🙂
kaushik sen

July 12, 2019 at 12:25 pm

could you please share the possible documentation if i use dell idrac for fencing …….
Omar

August 1, 2019 at 11:04 am

Hi Pradeep,
I have got this error after applying step4: Define Fencing device for each cluster node.
Error: Error: Agent ‘disk_fencing’ is not installed or does not provide valid metadata: Agent disk_fencing not found or does not support meta-data: Invalid argument (22)
Metadata query for stonith:disk_fencing failed: Input/output error
Phuong Nguyen

August 5, 2019 at 8:29 pm

Hi,
I tried, everything went smoothly except the last step – moving NFS shares on client.
I just could mount it on the nfs1.example.com, not another node.
The “ip a” show that the virtual floating IP was linked to the nfs1 node, not the nfs2 node, and can not ping from nfs2.
Any suggestions?
kaushik sen

May 6, 2020 at 6:34 pm

hi ,

i`m looking for Rhel HA with KVM guest cluster ?
Abdelkader

May 27, 2020 at 9:43 pm

Hi, thanks for the artical, i’m not able to mount the nfs using the vip:
[root@ibm-cli ~]# mount 192.168.56.100:/ /mnt/nfsshare/
mount: wrong fs type, bad option, bad superblock on 192.168.56.100:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount. helper program)

In some cases useful info is found in syslog – try
dmesg | tail or so.

Here is the result of lsblk, sdb1 as fencing device and sdc1 shared storage:
[root@nfs1 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 xfs a13287c6-e902-4162-8c96-fdbc42d487a0 /boot
└─sda2 LVM2_me M4KyTZ-1BPU-aWgE-8Um0-0gKf-lPDK-dDUdhS
├─centos-root
xfs ddf9c038-feec-45e0-83fe-be5b62843499 /
└─centos-swap
swap 15a46aa8-2cf1-4c94-ae6b-982b7a38c249 [SWAP]
sdb
└─sdb1 xfs 800d010e-369c-4b33-a159-be1fab3064eb
sdc
└─sdc1 xfs 1ea91760-f20a-4358-a51a-7fce3760b72e
sr0 iso9660 CentOS 7 x86_64
2019-09-11-18-50-31-00
- Abdelkader
  
  May 27, 2020 at 10:06 pm
  
  fixed with the installation of nfs-utils package.
Jay

October 15, 2020 at 8:09 pm

How are these shared disks created for the fencing and the nfs share? “In this tutorial we are using a shared disk of size 1 GB (/dev/sdc) as a fencing device.” How is this shared disk created / mounted?

Isn’t the shared disk still a single point of failure?
Aradi

December 20, 2020 at 2:24 am

Hello .

Thanks for the article . it worked!! However ,i it did not work when i partitioned sdb1 as LVM volume

Best regards