Too many RDMs and too many Microsoft Clusters locking VMFS partitions on the SAN. ESX hosts spend too much time trying to read these locked devices, vCenter times out and the datastores dont get added.
NB this whole post is irrelevant in ESX 5 because you can reserve RDMs
Fix
(Before this blog was posted the fix was to do the lun masking below. however in ESX5 all you have to do is:
To mark the MSCS LUNs as permanently reserved on an already upgraded ESXi 5.1 host, set the permanently reserved flag in Host Profiles. For more information, see the vSphere documentation.
You can use esxcli command to mark the device as perennially reserved:
(Before this blog was posted the fix was to do the lun masking below. however in ESX5 all you have to do is:
To mark the MSCS LUNs as permanently reserved on an already upgraded ESXi 5.1 host, set the permanently reserved flag in Host Profiles. For more information, see the vSphere documentation.
You can use esxcli command to mark the device as perennially reserved:
esxcli storage core device setconfig -d naa.id --perennially-reserved=true
The above only works for ESXi5. For ESX 4 do the LUN Masking as detailed below:)LUN MASKING
The solution is mask the LUNs or RDMs from the ESX hosts
This can be done in 2 ways - at the SAN level or at the ESX host level. It was not possible to do this at the SAN level due to SAN limitations regarding sharing RDMs between cluster groups.
Docs referenced
Masking a LUN from ESX and ESXi using the MASK_PATH plug-in: VMWare KB Article: 1009449
Unable to claim the LUN back after unmasking it: VMWare KB Article: 1015252
Unpresenting a LUN containing a datastore from ESX 4.x and ESXi 4.x: VMWare KB Article: 1015084
The steps to MASKLUN Masking has to be done on the ESX command line using several different commands. The following example shows how to mask one RDM on one ESX host.
Towards the end of this document are scripts that have been implemented to mask off many RDMs from several ESX hosts and scripts to unmask the RDMs.
It may be necessary to unmask the RDMs or LUNs if it’s decided to run the VM on a host which currently has the MASKing rules applied to it.
1 Multipath Plug-ins Look at the Multipath Plug-ins currently installed on your ESX with the command:
# esxcfg-mpath -G
EG
[root@site1-intra-esx01 ~]# esxcfg-mpath –G
MASK_PATH
NMP
Verify that the MASK_PATH plugin is present
2 Claimrules List all the claimrules currently on the ESX with the command:
# esxcli corestorage claimrule list
For an unadulterated ESX host the output looks like:
EG
[root@site1-intra-esx01 ~]# esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 0 runtime transport NMP transport=usb
MP 1 runtime transport NMP transport=sata
MP 2 runtime transport NMP transport=ide
MP 3 runtime transport NMP transport=block
MP 4 runtime transport NMP transport=unknown
MP 101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport
MP 101 file vendor MASK_PATH vendor=DELL model=Universal Xport
MP 65535 runtime vendor NMP vendor=* model=*
3 Identify eui numbers Identify the euid of the RDMs that you want to mask - this can be done from the gui or command line.
To do it from the GUI right click on the VM and go to "Edit Settings" then click on the "Manage Paths" button in the VM properties having highlighted the RDM disk first, then look for euid (IBM XIVs use euid - other SAN use naa)
From the command line there are 2 steps to find the euid:
First query the VM's rdm mapping file to find the vml identifier:
vmkfstools -q vmfilename.vmdk
EG
[root@site1-intra-esx03 site1-INTRA-SQL04]# vmkfstools -q site1-INTRA-SQL04_6.vmdk
Disk site1-INTRA-SQL04_6.vmdk is a Passthrough Raw Device Mapping
Maps to: vml.01005c00003738303042354530413934323831305849
next list the vml to euid mapping:
ls -l /vmfs/devices/disks/ | grep -i vml.number_from_above
EG
root@site1-intra-esx03 site1-INTRA-SQL04]# ls -l /vmfs/devices/disks/ | grep -i vml.01005c00003738303042354530413934323831305849
lrwxrwxrwx 1 root root 20 Feb 15 09:42 vml.01005c00003738303042354530413934323831305849 -> eui.001738000b5e0a94
lrwxrwxrwx 1 root root 22 Feb 15 09:42 vml.01005c00003738303042354530413934323831305849:1 -> eui.001738000b5e0a94:1
eui.001738000b5e0a94 is the euid needed
4 Eui pathsCheck all of the paths that the euid device has (vmhbaX:C0:TX:L92)
# esxcfg-mpath -L | grep euid
EG
[root@site1-intra-esx01 ~]#esxcfg-mpath -L | grep 001738000b5e0a94
vmhba5:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba5 0 0 92 NMP active san fc.200000051e8bf5d2:100000051e8bf5d2 fc.500173800b5e0000:500173800b5e0142
vmhba5:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba5 0 1 92 NMP active san fc.200000051e8bf5d2:100000051e8bf5d2 fc.500173800b5e0000:500173800b5e0152
vmhba4:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba4 0 0 92 NMP active san fc.200000051e8bf5d1:100000051e8bf5d1 fc.500173800b5e0000:500173800b5e0180
vmhba4:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba4 0 1 92 NMP active san fc.200000051e8bf5d1:100000051e8bf5d1 fc.500173800b5e0000:500173800b5e0170
vmhba3:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba3 0 0 92 NMP active san fc.200000051e8bebb5:100000051e8bebb5 fc.500173800b5e0000:500173800b5e0172
vmhba3:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba3 0 1 92 NMP active san fc.200000051e8bebb5:100000051e8bebb5 fc.500173800b5e0000:500173800b5e0182
vmhba2:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba2 0 0 92 NMP active san fc.200000051e8bebb4:100000051e8bebb4 fc.500173800b5e0000:500173800b5e0140
vmhba2:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba2 0 1 92 NMP active san fc.200000051e8bebb4:100000051e8bebb4 fc.500173800b5e0000:500173800b5e0150
note the output of this command returns the LUN ID of the RDM (L92)
5 Device PathsCheck that no other devices are using the same parameters:
# esxcfg-mpath -L | egrep "vmhba X X X"
(As you apply the rule -A vmhbaX -C 0 -L 92 , this verifies that there is no other device with those parameters. You can use the wildcards "vmhba.*L92" ( . means any character and * means zero or more times).
EG
[root@site1-intra-esx01 ~]#esxcfg-mpath -L | egrep "vmhba.*L92"
vmhba5:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba5 0 0 92 NMP active san fc.200000051e8bf5d2:100000051e8bf5d2 fc.500173800b5e0000:500173800b5e0142
vmhba5:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba5 0 1 92 NMP active san fc.200000051e8bf5d2:100000051e8bf5d2 fc.500173800b5e0000:500173800b5e0152
vmhba4:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba4 0 0 92 NMP active san fc.200000051e8bf5d1:100000051e8bf5d1 fc.500173800b5e0000:500173800b5e0180
vmhba4:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba4 0 1 92 NMP active san fc.200000051e8bf5d1:100000051e8bf5d1 fc.500173800b5e0000:500173800b5e0170
vmhba3:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba3 0 0 92 NMP active san fc.200000051e8bebb5:100000051e8bebb5 fc.500173800b5e0000:500173800b5e0172
vmhba3:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba3 0 1 92 NMP active san fc.200000051e8bebb5:100000051e8bebb5 fc.500173800b5e0000:500173800b5e0182
vmhba2:C0:T0:L92 state:active eui.001738000b5e0a94 vmhba2 0 0 92 NMP active san fc.200000051e8bebb4:100000051e8bebb4 fc.500173800b5e0000:500173800b5e0140
vmhba2:C0:T1:L92 state:active eui.001738000b5e0a94 vmhba2 0 1 92 NMP active san fc.200000051e8bebb4:100000051e8bebb4 fc.500173800b5e0000:500173800b5e0150
This shows that eui.001738000b5e0a94 is the only device using this path - so when you mask off this euid thats all you will be doing!
6 RulesAdd a rule to hide the LUN with the command:
# esxcli corestorage claimrule add --rule <number> -t location -A <hba_adapter> -C <channel> -T <target> -L <lun> -P MASK_PATH
Note the rule has to be applied to all HBAs (check the ESX HBA numbering - note in the example below there is no HBA1: only numbers 2 - 5 exist, as discovered in the section above)
Rules must be numbered. They must be between 101 and 200 - rule 101 is already in use by DELL - see section 2
Rule to mask HARD DISK 2 scsi1:1 site1-INTRA-SQL04_6.vmdk (RDM) LUN ID 92 eui.001738000b5e0a94
EG
[root@site1-intra-esx01 ~]#esxcli corestorage claimrule add --rule 102 -t location -A vmhba2 -C 0 -L 92 -P MASK_PATH
[root@site1-intra-esx01 ~]#esxcli corestorage claimrule add --rule 103 -t location -A vmhba3 -C 0 -L 92 -P MASK_PATH
[root@site1-intra-esx01 ~]#esxcli corestorage claimrule add --rule 104 -t location -A vmhba4 -C 0 -L 92 -P MASK_PATH
[root@site1-intra-esx01 ~]#esxcli corestorage claimrule add --rule 105 -t location -A vmhba5 -C 0 -L 92 -P MASK_PATH
7 Reload RulesReload your claimrules with the command:
EG
[root@site1-intra-esx01 ~]#esxcli corestorage claimrule load
8 Unclaim PathsUnclaim all paths to a device and then run the loaded claimrules on each of the paths to reclaim them.
EG
[root@site1-intra-esx01 ~]#esxcli corestorage claiming reclaim -d eui.001738000b5e0a94
9 Verification of masked deviceVerify that the masked device is no longer used by the ESX host
EG
[root@site1-intra-esx01 ~]#esxcfg-mpath -L | grep eui.001738000b5e0a94
Empty output indicates that the LUN is not active.
Refresh storage from the GUI - the LUN or RDM should disappear from there as well
Script to MASK RDMs of SQL03 & 04 cluster - applied to site1-INTRA-ESX01 & 02
#!/bin/bash
# Guy Cowie
# 16 Feb 2012
# Script to mask site1-intra-sql03 rdms from ESX host
date
hostname
# Display the claim rules
esxcli corestorage claimrule list
# Rules to hide the RDMs
# Rule to mask HARD DISK 2 scsi1:1 site1-INTRA-SQL04_6.vmdk LUN ID 92 eui.001849000b5e0a94
esxcli corestorage claimrule add --rule 102 -t location -A vmhba2 -C 0 -L 92 -P MASK_PATH
esxcli corestorage claimrule add --rule 103 -t location -A vmhba3 -C 0 -L 92 -P MASK_PATH
esxcli corestorage claimrule add --rule 104 -t location -A vmhba4 -C 0 -L 92 -P MASK_PATH
esxcli corestorage claimrule add --rule 105 -t location -A vmhba5 -C 0 -L 92 -P MASK_PATH
# Rule to mask HARD DISK 3 scsi1:10 site1-INTRA-SQL04_5.vmdk LUN ID 93 eui.001849000b5e0a95
esxcli corestorage claimrule add --rule 106 -t location -A vmhba2 -C 0 -L 93 -P MASK_PATH
esxcli corestorage claimrule add --rule 107 -t location -A vmhba3 -C 0 -L 93 -P MASK_PATH
esxcli corestorage claimrule add --rule 108 -t location -A vmhba4 -C 0 -L 93 -P MASK_PATH
esxcli corestorage claimrule add --rule 109 -t location -A vmhba5 -C 0 -L 93 -P MASK_PATH
# Rule to mask HARD DISK 4 scsi1:11 site1-INTRA-SQL04_4.vmdk LUN ID 94 eui.001849000b5e0a96
esxcli corestorage claimrule add --rule 110 -t location -A vmhba2 -C 0 -L 94 -P MASK_PATH
esxcli corestorage claimrule add --rule 111 -t location -A vmhba3 -C 0 -L 94 -P MASK_PATH
esxcli corestorage claimrule add --rule 112 -t location -A vmhba4 -C 0 -L 94 -P MASK_PATH
esxcli corestorage claimrule add --rule 113 -t location -A vmhba5 -C 0 -L 94 -P MASK_PATH
# Rule to mask HARD DISK 5 scsi1:13 site1-INTRA-SQL03_13.vmdk LUN ID 86 eui.001849000b5e055d
esxcli corestorage claimrule add --rule 114 -t location -A vmhba2 -C 0 -L 86 -P MASK_PATH
esxcli corestorage claimrule add --rule 115 -t location -A vmhba3 -C 0 -L 86 -P MASK_PATH
esxcli corestorage claimrule add --rule 116 -t location -A vmhba4 -C 0 -L 86 -P MASK_PATH
esxcli corestorage claimrule add --rule 117 -t location -A vmhba5 -C 0 -L 86 -P MASK_PATH
# Rule to mask HARD DISK 6 scsi1:14 LUN ID 90 eui.001849000b5e0a98
esxcli corestorage claimrule add --rule 118 -t location -A vmhba2 -C 0 -L 90 -P MASK_PATH
esxcli corestorage claimrule add --rule 119 -t location -A vmhba3 -C 0 -L 90 -P MASK_PATH
esxcli corestorage claimrule add --rule 120 -t location -A vmhba4 -C 0 -L 90 -P MASK_PATH
esxcli corestorage claimrule add --rule 121 -t location -A vmhba5 -C 0 -L 90 -P MASK_PATH
# Rule to mask HARD DISK 7 scsi1:15 LUN ID 91 eui.001849000b5e0a97
esxcli corestorage claimrule add --rule 122 -t location -A vmhba2 -C 0 -L 91 -P MASK_PATH
esxcli corestorage claimrule add --rule 123 -t location -A vmhba3 -C 0 -L 91 -P MASK_PATH
esxcli corestorage claimrule add --rule 124 -t location -A vmhba4 -C 0 -L 91 -P MASK_PATH
esxcli corestorage claimrule add --rule 125 -t location -A vmhba5 -C 0 -L 91 -P MASK_PATH
# Load the claim rule into the PSA
esxcli corestorage claimrule load
# Unclaim and reclaim the RDMs using their eui numbers
esxcli corestorage claiming reclaim -d eui.001849000b5e0a94
esxcli corestorage claiming reclaim -d eui.001849000b5e0a95
esxcli corestorage claiming reclaim -d eui.001849000b5e0a96
esxcli corestorage claiming reclaim -d eui.001849000b5e055d
esxcli corestorage claiming reclaim -d eui.001849000b5e0a98
esxcli corestorage claiming reclaim -d eui.001849000b5e0a97
# Display the claim rules
esxcli corestorage claimrule list
Script to unMASK RDMs SQL03 & 04 cluster
#!/bin/bash
# Guy Cowie
# 16 Feb 2012
# Script to unmask site1-intra-sql03 rdms from ESX host
date
hostname
# Display the claim rules
esxcli corestorage claimrule list
sleep 5s
# Rules to unmask the RDMs
# Rule to unmask HARD DISK 2 scsi1:1 site1-INTRA-SQL04_6.vmdk LUN ID 92 eui.001849000b5e0a94
esxcli corestorage claimrule delete --rule 102
esxcli corestorage claimrule delete --rule 103
esxcli corestorage claimrule delete --rule 104
esxcli corestorage claimrule delete --rule 105
# Rule to unmask HARD DISK 3 scsi1:10 site1-INTRA-SQL04_5.vmdk LUN ID 93 eui.001849000b5e0a95
esxcli corestorage claimrule delete --rule 106
esxcli corestorage claimrule delete --rule 107
esxcli corestorage claimrule delete --rule 108
esxcli corestorage claimrule delete --rule 109
# Rule to unmask HARD DISK 4 scsi1:11 site1-INTRA-SQL04_4.vmdk LUN ID 94 eui.001849000b5e0a96
esxcli corestorage claimrule delete --rule 110
esxcli corestorage claimrule delete --rule 111
esxcli corestorage claimrule delete --rule 112
esxcli corestorage claimrule delete --rule 113
# Rule to unmask HARD DISK 5 scsi1:13 site1-INTRA-SQL03_13.vmdk LUN ID 86 eui.001849000b5e055d
esxcli corestorage claimrule delete --rule 114
esxcli corestorage claimrule delete --rule 115
esxcli corestorage claimrule delete --rule 116
esxcli corestorage claimrule delete --rule 117
# Rule to unmask HARD DISK 6 scsi1:14 LUN ID 90 eui.001849000b5e0a98
esxcli corestorage claimrule delete --rule 118
esxcli corestorage claimrule delete --rule 119
esxcli corestorage claimrule delete --rule 120
esxcli corestorage claimrule delete --rule 121
# Rule to unmask HARD DISK 7 scsi1:15 LUN ID 91 eui.001849000b5e0a97
esxcli corestorage claimrule delete --rule 122
esxcli corestorage claimrule delete --rule 123
esxcli corestorage claimrule delete --rule 124
esxcli corestorage claimrule delete --rule 125
sleep 5s
# Load the claim rule into the PSA
esxcli corestorage claimrule load
sleep 5s
# Rule to unmask HARD DISK 2 scsi1:1 site1-INTRA-SQL04_6.vmdk LUN ID 92 eui.001849000b5e0a94
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 92
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 92
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 92
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 92
# Rule to unmask HARD DISK 3 scsi1:10 site1-INTRA-SQL04_5.vmdk LUN ID 93 eui.001849000b5e0a95
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 93
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 93
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 93
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 93
# Rule to unmask HARD DISK 4 scsi1:11 site1-INTRA-SQL04_4.vmdk LUN ID 94 eui.001849000b5e0a96
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 94
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 94
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 94
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 94
# Rule to unmask HARD DISK 5 scsi1:13 site1-INTRA-SQL03_13.vmdk LUN ID 86 eui.001849000b5e055d
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 86
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 86
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 86
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 86
# Rule to unmask HARD DISK 6 scsi1:14 LUN ID 90 eui.001849000b5e0a98
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 90
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 90
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 90
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 90
# Rule to unmask HARD DISK 7 scsi1:15 LUN ID 91 eui.001849000b5e0a97
esxcli corestorage claiming unclaim -t location -A vmhba5 -C 0 -L 91
esxcli corestorage claiming unclaim -t location -A vmhba2 -C 0 -L 91
esxcli corestorage claiming unclaim -t location -A vmhba3 -C 0 -L 91
esxcli corestorage claiming unclaim -t location -A vmhba4 -C 0 -L 91
esxcfg-rescan -A
sleep 5s
# Unclaim and reclaim the RDMs using their eui numbers
esxcli corestorage claiming reclaim -d eui.001849000b5e0a94
esxcli corestorage claiming reclaim -d eui.001849000b5e0a95
esxcli corestorage claiming reclaim -d eui.001849000b5e0a96
esxcli corestorage claiming reclaim -d eui.001849000b5e055d
esxcli corestorage claiming reclaim -d eui.001849000b5e0a98
esxcli corestorage claiming reclaim -d eui.001849000b5e0a97
# Display the claim rules
esxcli corestorage claimrule list
No comments:
Post a Comment