OVM – Remove stale cluster enties
Intro
hypervisor. OVM’s latest release version 3.4.6.3 is the latest one
available. Oracle announces extended support for OVM and the support period
is March 2021 and will end on March 31, 2024.
If you need more information about OVM support read the below mentioned article:
https://blogs.oracle.com/virtualization/post/announcing-oracle-vm-3-extended-support/
release for OVM. The latest technology going to Oracle KVM. Oracle KVM
is much more stable than the OVM and gives you more flexibility in the
virtualization environment. If you are still planning on staying on on-prem. I
would say this is the right time to plan your journey to KVM.
In this article, I will cover the issue we faced recently in the OVM
environment.
Overview of the issue
to sudden data center power outage. Once everything was online we were not
able to start the OVM hypervisor. so we had to perform a complete
reinstallation of the node.
the repositories. The next option was to remove the nodes from the cluster
again, This action was performed via GUI.
node02. we found that there were some stale entries in the master node and
node02.
entries :
OVM – How To Remove Stale Entry of the Oracle VM Server which was Removed
from The Pool (Doc ID 2418834.1)
How to identify there are stale entire?
of all the information about the cluster. I have highlighted node information in red color.
[root@ovm-node02 ~]# service o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "f6f6b47b38e288e0": Online
Heartbeat dead threshold: 61
Network idle timeout: 60000
Network keepalive delay: 2000
Network reconnect delay: 2000
Heartbeat mode: Global
Checking O2CB heartbeat: Active
0004FB0000050000B705B4397850AAD6 /dev/dm-2
Nodes in O2CB cluster: 0 1
Debug file system at /sys/kernel/debug: mounted
cluster you should see only one entry. But here there are two entries.
[root@ovm-node02 ovm-node02]# ls -lrth /sys/kernel/config/cluster/f6f6b47b38e288e0/node/
total 0
drwxr-xr-x 2 root root 0 Jun 23 09:28 ovm-node02
drwxr-xr-x 2 root root 0 Jun 23 09:33 ovm-node01
[root@ovm-node02 ovm-node02]#
The next step is to validate from the master node (ovm-node01)
database entries. This shows there are two pool_member_ip_list.
[root@ovm-node01]# ovs-agent-db dump_db server
{'cluster_state': 'DLM_Ready',
'clustered': True,
'fs_stat_uuid_list': ['0004fb000005000015c1fb14ef761f40',
'0004fb000005000079ae03177c3edc7e',
'0004fb000005000065985109f8834e8b'],
'is_master': True,
'manager_event_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Event',
'manager_ip': '192.168.85.152',
'manager_statistic_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Statistic',
'manager_uuid': '0004fb0000010000c8ecbd219dc6b1ee',
'node_number': 0,
'pool_alias': 'EclipsysOVM',
'pool_master_ip': '192.168.85.177',
'pool_member_ip_list': ['192.168.85.177', '192.168.85.178'],
'pool_uuid': '0004fb0000020000f6f6b47b38e288e0',
'poolfs_nfsbase_uuid': '',
'poolfs_target': '/dev/mapper/36861a6fddaa0481ec0dd3584514a8d62',
'poolfs_type': 'lun',
'poolfs_uuid': '0004fb0000050000b705b4397850aad6',
'registered_hostname': 'ovm-node01',
'registered_ip': '192.168.85.177',
'roles': set(['utility', 'xen'])}
[root@calavsovm01 ovm-node01]#
Remove node from cluster commands line
Now we can remove the oven-node02 from the second node.
[root@ovm-node01]# o2cb remove-node f6f6b47b38e288e0 ovm-node02
Validate node entries
After removing node02, we can see only one entry in the OVM database.
[root@ovm-node01]# ls /sys/kernel/config/cluster/f6f6b47b38e288e0/node/
ovm-node02
[root@ovm-node01]#
Validate using O2CB
First, restart the ovs-agent on both nodes and validate the o2cb cluster status from node01.
[root@ovm-node01]# service ovs-agent restart
Stopping Oracle VM Agent: [ OK ]
Starting Oracle VM Agent: [ OK ]
[root@ovm-node01 ~]# service ovs-agent status
log server (pid 32442) is running...
notificationserver server (pid 32458) is running...
remaster server (pid 32464) is running...
monitor server (pid 32466) is running...
ha server (pid 32468) is running...
stats server (pid 32470) is running...
xmlrpc server (pid 32474) is running...
fsstats server (pid 32476) is running...
apparentsize server (pid 32477) is running...
[root@ovm-node01 ~]#
Also I would recommend to restart the node02 after the node removal, Once the node is back online validate the /etc/ocfs2/cluster.conf
[root@ovm-node01 ~]# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 1
name = f6f6b47b38e288e0
node:
number = 0
cluster = f6f6b47b38e288e0
ip_port = 7777
ip_address = 10.110.110.101
name = ovm-node01
heartbeat:
cluster = f6f6b47b38e288e0
region = 0004FB0000050000B705B4397850AAD6
Note: ovs-agent restart won’t have any impact on running VMs.
[root@ovm-node01]# ovs-agent-db dump_db server
{'cluster_state': 'DLM_Ready',
'clustered': True,
'fs_stat_uuid_list': ['0004fb000005000015c1fb14ef761f40',
'0004fb000005000079ae03177c3edc7e',
'0004fb000005000065985109f8834e8b'],
'is_master': True,
'manager_event_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Event',
'manager_ip': '192.168.85.152',
'manager_statistic_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Statistic',
'manager_uuid': '0004fb0000010000c8ecbd219dc6b1ee',
'node_number': 0,
'pool_alias': 'EclipsysOVM',
'pool_master_ip': '192.168.85.177',
'pool_member_ip_list': ['192.168.85.177'],
'pool_uuid': '0004fb0000020000f6f6b47b38e288e0',
'poolfs_nfsbase_uuid': '',
'poolfs_target': '/dev/mapper/36861a6fddaa0481ec0dd3584514a8d62',
'poolfs_type': 'lun',
'poolfs_uuid': '0004fb0000050000b705b4397850aad6',
'registered_hostname': 'ovm-node01',
'registered_ip': '192.168.85.177',
'roles': set(['utility', 'xen'])}
[root@ovm-node01]#
Conclusion
There can be situations gui will not remove the entries from the OVM hypervisor. Always validate the OVM data entries before retying the node addition to the cluster. Make sure the cluster-shared repositories are mounting automatically.