Show vSAN objects and components count on vSAN enabled clusters

Table of Contents

Introduction

In this short blog we will show how to get the counts of vSAN objects and components. This can be useful to see which components are causing a issue when you almost hit the limit.

Limits
vSAN OSA has a components limit of 9000 per host.
vSAN ESA has a components limit of 27000 per host.

Limits can be reached for example with large VM’s or VCDA protection replications.

Getting the component count

First we need to start a SSH session to our vCenter.

1. Start a SSH session to the vCenter.
2. Login with “root”.
3. Now we run the following command:

					rvc 127.0.0.1
				

4. Now fill in the password of “administrator@vsphere.local”
5. Once it is loaded you will see the following:

0 /
1 127.0.0.1/

If you don’t see this use the following command:

					ls
				

6. We want to go to the localhost (127.0.0.1). This can be a different number for you. For me it is number 1, so I will type the following:

					cd 1
				

7. Once we are in the localhost we can type the following to show the directories:

					ls
				

8. Here we will see all the datacenters. Choose the correct datacenter. For me it is number 0, so I type the following:

					cd 0
				

9. Again we will show the directories by typing: 

					ls
				

10. Here we will see different directories. In our case we need the computers [host] directory. For me it is number 1. So I will type the following:

					cd 1
				

11. Again we will show the directories by typing: 

					ls
				

12. Here we can see all our clusters. Choose the cluster you want to see the objects and components of. For me this is number 8 so I will type the following:

					cd 8
				

13. We should now be at:

/127.0.0.1/t01pz03-s01-dc01/computers/t01pz03-s01-cl09

14. This is the location we need to be in. Now we run the following command:

					vsan.obj_status_report . -t
				

15. This command will show us all the objects and the components. The results will look something like:

2024-12-03 12:18:22 +0100: Querying all VMs on vSAN ...
2024-12-03 12:18:24 +0100: Querying DOM_OBJECT in the system from h001.demo.dashcenter.blog ...
2024-12-03 12:18:24 +0100: Querying DOM_OBJECT in the system from h002.demo.dashcenter.blog ...
2024-12-03 12:18:25 +0100: Querying DOM_OBJECT in the system from h003.demo.dashcenter.blog ...
2024-12-03 12:18:25 +0100: Querying DOM_OBJECT in the system from h004.demo.dashcenter.blog ...
2024-12-03 12:18:26 +0100: Querying DOM_OBJECT in the system from h005.demo.dashcenter.blog ...
2024-12-03 12:18:26 +0100: Querying DOM_OBJECT in the system from h006.demo.dashcenter.blog ...
2024-12-03 12:18:26 +0100: Querying DOM_OBJECT in the system from h007.demo.dashcenter.blog ...
2024-12-03 12:18:27 +0100: Querying DOM_OBJECT in the system from h008.demo.dashcenter.blog ...
2024-12-03 12:18:27 +0100: Querying DOM_OBJECT in the system from h009.demo.dashcenter.blog ...
2024-12-03 12:18:28 +0100: Querying all disks in the system from h010.demo.dashcenter.blog ...
2024-12-03 12:18:30 +0100: Querying all object versions in the system ...
/opt/vmware/rvc/lib/rvc/lib/vsanupgrade.rb:1052: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
2024-12-03 12:18:35 +0100: Got all the info, computing table ...
Histogram of component health for non-orphaned objects
+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 4/4 (OK)                            |  203                         |
| 3/3 (OK)                            |  8640                        |
| 8/8 (OK)                            |  111                         |
| 24/24 (OK)                          |  1                           |
| 16/16 (OK)                          |  8                           |
| 5/5 (OK)                            |  104                         |
| 9/9 (OK)                            |  6                           |
| 7/7 (OK)                            |  4                           |
| 10/10 (OK)                          |  18                          |
| 6/6 (OK)                            |  21                          |
| 26/26 (OK)                          |  1                           |
| 32/32 (OK)                          |  1                           |
| 36/36 (OK)                          |  12                          |
| 76/76 (OK)                          |  1                           |
+-------------------------------------+------------------------------+
Total non-orphans: 9131
Histogram of component health for possibly orphaned objects
+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
+-------------------------------------+------------------------------+
Total orphans: 0
Total v19 objects: 22290
+-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+
| VM/Object                                                                                                                                       | objects | num healthy / total comps |
+-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+
| *********-7CNs                                                                                                                              | 2       |                           |
|    [*********-vsan01] 53101667-5cb7-0132-e6fb-78ac44b55210/*********-7CNs.vmx                                                     |         | 3/3                       |
|    [*********-vsan01] 53101667-5cb7-0132-e6fb-78ac44b55210/*********-7CNs.vmdk                                                    |         | 3/3                       |
| *********-pTVE                                                                                                                              | 1       |                           |
|    [*********-vsan01] b9741667-1899-cc2b-84fe-78ac44b57670/*********-pTVE.vmx                                                     |         | 3/3                       |
| *********-6M5Z                                                                                                                              | 1       |                           |
|    [*********-vsan01] 52981767-f422-12a4-e270-78ac44b55210/*********-6M5Z-28c9bcf8.vswp                                           |         | 3/3                       |
| *********-UBx2                                                                                                                              | 2       |                           |
|    [*********-vsan01] 13991767-5c15-63e5-3c9a-78ac44b57670/*********-UBx2.vmdk                                                    |         | 3/3                       |
|    [*********-vsan01] 13991767-5c15-63e5-3c9a-78ac44b57670/*********-UBx2-c623421f.vswp                                           |         | 3/3                       |
| *********-Uavb                                                                                                                              | 2       |                           |
+-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+ +-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+ | Secondary namespaces | 1 | | +-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+ | Unassociated objects | 8058 | | | e1e84e67-f4a9-2100-67e9-78ac44b7e060 | | 4/4 | | 72bd4d67-e81f-6300-137e-78ac44b7e060 | | 8/8 | | 0eda4d67-0625-b200-448f-78ac44b7e060 | | 3/3 | | 37be4d67-3483-b200-a52b-78ac44b7e060 | | 3/3 | | beaf4d67-3eb4-3c01-bdba-78ac44b7e060 | | 3/3 | | d9e84e67-aa76-5101-cb8e-78ac44b7e060 | | 3/3 | | d8824e67-ca7a-a301-3052-78ac44b7e060 | | 3/3 | | f0114e67-1e94-d601-4401-78ac44b54fb0 | | 3/3 | | 7dc94e67-9471-5d02-27a6-78ac44b7e060 | | 3/3 | | 71bd4d67-d6db-7702-e122-78ac44b57670 | | 3/3 | | b4204e67-36e4-7a02-4c0d-78ac44b7e060 | | 3/3 | | cd9e4e67-0697-8202-9af3-78ac44b57670 | | 3/3 | | cdbd4d67-d206-ab02-ab73-78ac44b80710 | | 3/3 | | d8cb4d67-e8ff-de02-33dc-78ac44b7e060 | | 3/3 | +-------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------+ WARNING: Unassociated does NOT necessarily mean unused/unneeded. Deleting unassociated objects may cause data loss!!! You must read the following KB before you delete any unassociated objects!!! https://kb.vmware.com/s/article/70726 +------------------------------------------------------------------+ | Legend: * = all unhealthy comps were deleted (disks present) | | - = some unhealthy comps deleted, some not or can't tell | | no symbol = We cannot conclude any comps were deleted | +------------------------------------------------------------------+

I have shorten the output because it was to long, but this should give you a better understanding of the output.

Awid Dashtgoli
Awid Dashtgoli

Understanding and Configuring Disaster Recovery on Cloud Director Availability

Table of Contents

Introduction

In this blog we will go through the Disaster Recovery (DR) of Cloud Director Availability. We will configure Disaster Recovery but also show the possibilities and options.

Our Scenario

I have created a vAPP with 3 virtual machines. We want to create a Disaster Recovery (DR) to our second datacenter. We want to Disaster Recovery to have a RPO of 5 Minutes and a retention policy of 12 instances created every two hours.

This will give us 12 instances over 24 hours. After 24 hours I won’t need Disaster Recovery anymore, because I have backups in place.

Understanding and Configuring Disaster Recovery

Let’s start with configuring our Disaster Recovery.

1. Login to your Cloud Director portal.
2. On the left side under “More” you will see your Cloud Director Availability instances.
3. Click on it.
4. Now you will see Cloud Director Availability.
5. As you can see we don’t have any DR configured.
6. Click on “Outgoing Replications”.

7. Click on “Outgoing Replications”.
8. Select your “Destination site”.
9. Make sure you are on “vApp” and click on the “New Protection” icon.

10. Now you will see the “New Outgoing Protection” screen.
11. Select your “Destination site”.
12. Make sure you are on “vApp” and choose the vApp you want to protect.
13. Click on “Next”.

14. On the next screen you can choose the Destination VDC but also the Storage Tier you want to use.
15. Choose the correct VDC and Tier storage.
16. Click on “Next”.

Information! The “Advanced Datastore Settings” can be used to choose different storage tiers for every VM or even every Hard disk on the VM’s.

17. On this screen we can select a predefined SLA profile provided by your Cloud Provider or manually configure the SLA settings.
18. In our case we will configure our settings manually.

17. Select “Configure settings manually”.
18. Now we go through the “RPO” and “Retention Policy”.

Recovery Point Objective (RPO)
RPO stands for Recovery Point Objective. This will make sure you can go back 5 minutes in case of a Disaster Recovery. Cloud Director Availability will create every 5 minutes a instance and remove the old one.

After it created the instance it will consolidate the previous 5 minute instance. This makes sure it only needs the Delta’s instead of the full copy.

Retention Policy
The Retention Policy provides extra instances instead of the single 5 minutes instance. You can select the amount of instances, distance between every instance and the Unit the distance is configured at.

19. For the RPO we choose 5 minutes, because we want the most close instance in case of a DR.
20. For the Retention Policy we choose 12 instances with a Distance of 2 hours. This will make it possible to return back every 2 hours for a total of 24 hours with 12 instances.
21. Once you have configured the RPO and Retention Policy we can click on “Next”.

Information! Other options can be enabled. These are extra settings that can help you with creating the DR you need. The i icon next to the setting will give you more information about the setting.

22. Now we see a summary of all the settings we have chosen.
23. If everything is correct click on “Finish”.
24. It will start configuring the replication/DR.

25. Once the configuration is completed we can see our “Protection”.
26. Here you can see the configuration of the DR/Protection.

Status
On every vApp and VM you will see a green, blue or red dot.
A
green dot means everything is healthy.
A
blue dot means there is currently a instance created (5 minutes RPO).
A
red dot means there is something wrong with the replication.

Information! Replications won’t work if the VM is “Shut Down”. The VM needs to be running to be able to replicate.

Testing the Protection

We can test our DR/Protection by running a test. Keep in mind this will cause a short outage, because the VM will be migrated to the other Datacenter.

If you run a “Test” Cloud Director Availability will first sync the vApp and VM’s before it performs a test. This is to make sure you won’t lose any data (because of the 5 Minute RPO) while testing.

Awid Dashtgoli
Awid Dashtgoli

How to change vSAN service subnet with Zero downtime

Table of Contents

Introduction

In this blog we go through changing our vSAN service subnet without any downtime. In our case we also want to keep the vSAN vlan we are currently using.

vSAN is a sensitive part of a cluster. With a small cluster (3/4 hosts) or fault domains it could be difficult to change the subnet of a production vSAN environment.

In my environment we have a total of 3 hosts in the cluster. 3 hosts are all three in use because vSAN uses a 2 components and 1 witness structure. This means we can not easily change the ip address or Portgroup of the vSAN network.

Solution

Fortunately we have an solution for this. We can create a second vlan and subnet on our switch and create a temporary portgroup. After the portgroup is created we can create one extra VMkernel on every host.
This VMKernel will be used for vSAN services but for the new vlan/temporary subnet.

With this structure we can move our hosts to our temporary portgroup and change our primary portgroup. After we changed our primary portgroup we will move the hosts back to the primary portgroup and clean up our temporary portgroup.

Preparation

Let’s begin with the preparation of the following parts:

  • Create new temporary vlan
  • Create new temporary portgroup
  • Configure a new VMKernel on all hosts of the cluster

Create new vlan and subnet on Switch

1. Login to your switch and create a new vlan with a new subnet. The size of the subnet does not matter.
2. After creating the vlan tag the vlan on the interfaces where the hosts are connected.
3. Once we have the vlans tagged we can create a portgroup on the DVS switch in vCenter.

Create temporary portgroup

4. Login to vCenter and create a new portgroup.

5. Give the Portgroup a name. Personally I name the portgroup something with “Temp”, this makes the cleanup afterwards easier.
6. Click on “Next”

7. Configure the VLAN part. Use the VLAN configured on the switches in step 1.
8. Click “Next”.

9. Click “Finish”

Create VMKernel port on the hosts in the cluster

10. Navigate to the hosts in the cluster.
11. Choose one host and go to “Configure” tab.
12. Under networking you will see “VMkernel adapters” click on it.
13. Click on “Add Networking”.

14. For the connection type Choose “VMkernel Network Adapter” and click “Next”.
15. For the target device search for the temporary portgroup you just created.
16. Select the temporary portgroup and click “Next”.

17. On the Portgroup properties choose the vSAN service and click “Next”.

18. On the IPv4 settings page choose “Use static IPv4 settings”.
19. Fill in the new subnet IP Address, Subnet mask and Default gateway for the first host and click “Next”.

20. Click on “Finish”.
21. Repeat the steps for the other hosts in the cluster. Every host should have a unique IP address.
22. Once we have all the hosts configured with the new VMKernel we can proceed with the actual move.

Move hosts to temporary VMKernel

Warning! Before starting with the solution, always make sure to put the hosts via vCenter in maintenance mode!

1. Click on the last host in the cluster.
2. Put the host in maintenance mode and choose “Ensure accessibility”.
3. Once the host is in maintenance mode go to “Configure” -> “VMkernel adapters”
4. Now we will disable the vSAN service on our primary vSAN VMKernel adapter.
5. Click the three dots and choose “Edit”.
6. Under “
Enabled services” uncheck “vSAN”.
7. Click on “OK”.
8. Take the host out of maintenance mode.
9. Repeat this step for the other host. From last to first host.

Warning! In the process you will see some alarms. You can ignore those.

10. Once we have disabled the primary vSAN service on the primary VMKernel on all hosts we can start reconfiguring our primary VMKernel with the new subnet.
11. Reconfigure the primary subnet and/or vlan on the switches.

Move hosts back to primary VMKernel

Enable primary VMKernel on all hosts

1. Click on the last host in the cluster.
3. Go to “Configure” -> “VMkernel adapters”
4. Now we will enable 
the vSAN service and reconfigure the IPv4 settings of the primary vSAN VMKernel adapter with our new subnet.
5. Click the three dots and choose “Edit”.
6. Under “Enabled services” check “vSAN”.
7. On the left side click on “IPv4 Settings”.
8. Reconfigure the IPv4 Settings with our new subnet.
9. Click on “OK”.
10. Repeat this step for the other host. From last to first host.
11. now the primary VMKernel has vSAN enabled again we can disable the vSAN service on the temporary VMKernel again.

Disable temporary VMKernel on all hosts

Warning! Before starting with the solution, always make sure to put the hosts via vCenter in maintenance mode!

1. Click on the last host in the cluster.
2. Put the host in maintenance mode and choose “Ensure accessibility”.
3. Once the host is in maintenance mode go to “Configure” -> “VMkernel adapters”
4. Now we will disable the vSAN service on our temporary vSAN VMKernel adapter.
5. Click the three dots and choose “Edit”.
6. Under “
Enabled services” uncheck “vSAN”.
7. Click on “OK”.
8. Take the host out of maintenance mode.
9. Repeat this step for the other host. From last to first host.

Warning! In the process you will see some alarms. You can ignore those.

Cleanup

After we have moved all hosts back to the primary vSAN service VMKernel and everything is running without issues we can start the cleanup.

We will clean up the following components:

  • Our temporary VMKernel on our hosts.
  • Our temporary Portgroup.
  • Our temporary vlan on our switches (Including vlan tags on our interfaces).
Awid Dashtgoli
Awid Dashtgoli

Understanding and Configuring Inter VRF Routing on NSX

Table of Contents

Introduction

In this blog we will go through understanding and configuring Inter VRF Routing for specific use cases on NSX.

Inter VRF Routing is used for communication between different VRF Lite on a Edge Cluster. This can be useful for couple of scenarios.

One of the scenarios is having one VRF specifically for Internet routes and subnets. One of the benefits with this scenario is only using BGP connections to our physical environment to our Internet VRF and from there distributing it to other “Costumer VRF’s/Edges” within the Edge Cluster.

In the design below you will see a example of the construction:

In this design we see a BGP connection with a /24 prefix from our physical routers to our T0 Internet VRF. From there we split the /24 in to multiple subnets and give customers via the Inter VRF Routing a /27 and a /28 prefix.

Let’s start the configuration and show you the design within NSX.

Configuration

Create T0 Internet VRF

First we need to create our T0 Internet VRF on NSX.

1. Login to your NSX manager and go to “Networking” -> “Tier-0 Gateways”
2. Click on “Add Gateway” and choose “VRF”

3. Choose a name and connect it to a T0.
4. Click on “Save”.

5. You will see a message “Do you want to continue configuring?”. Choose “Yes”.

Configure Interfaces, Prefixes and BGP on T0 Internet VRF

Now we start with the configuration of the T0 Internet VRF. In this blog we will not go to deep in to BGP connections and how to set them up, but we will prepare the prefixes.

I will have a separate blog about BGP connections.

1. First create the interfaces for the BGP connections.
2. Now we will create the prefixes for the BGP connections to allow the /24 to be advertised from the inside.
3. Under “Routing” click on “IP Prefix Lists”.

4. Click on “Add IP Prefix List” and choose a name.
5. Click on “Set”.

6. I will start with the inbound prefixes. In this case I only add and permit the default route (0.0.0.0/0).
7. Fill in the network and put the Action on “Permit”.

8. Click on “Apply”.

9. Click on “Save”.

10. Next we will create the outbound prefix.
11. Follow the same steps and create a prefix list.
12. In this case we add our /24 public address.

13. Now we have created the prefixes we can create the BGP connections. The BGP connections will have the prefixes we just made for inbound and outbound.
14. Last we add our /24 support to “Route Aggregation. Click on “Route Aggregation”.

15. Click on “Add Prefix” and fill in your /24 public subnet. For “Summary” choose “Yes”.
16. Click on “Add” and after that on “Apply”.

17. Now we can click on “Close Edit”

Create and Configure Customer VRF

For the Customer VRF we follow the same steps as the Internet VRF, but we do not set up the interfaces and BGP connections. In this case only the prefix lists for inbound and outbound.

1. After creating the Customer VRF we will start creating the Prefix lists for inbound and outbound again. Only in this case we do not add the /24 to the outbound prefix but only the /27 (Customer 1) or /28 (Customer 2).

Configure Inter VRF

Now we will start the configuration of the Inter VRF Routing.

Create Route Maps

First we need to create Route Maps on all the VRF’s. In this case Internet VRF, Customer 1 VRF and Customer 2 VRF.

1. Start with the Internet VRF Route Maps.
2. Go the the Internet VRF click on the three dots and click on “Edit”.

3. Click on “Route Maps”

4. Click on “Add Route Map”
5. Choose a name for the inbound Route Map and click on “Set”

6. Click on “Add Match Criteria”
7. Put Action on “Permit” and click on “Set”

8. Search for the prefix you created in the previous steps. In this case the inbound prefix.
9. Select the prefix and click on “Apply”

10. Click on “Add” and after that on “Apply”

11. Click on “Save” to save the Inbound Route Map.
12. Now follow the same steps for the Outbound Route Map and don’t forget to choose the Outbound Prefix in step 8.
13. Now follow the same steps for the Customer VRF’s.

Create Inter VRF Routing

The last part is to create the Inter VRF Routing. This part is pretty easy.

1. Let’s again start with the Internet VRF. Click on “Inter VRF Routing”

2. Click on “Add Inter VRF Routing”
3. Choose the “Connected Gateway” in this case Customer 1.
4. Click on “Set”.

5. Click on the three dots and click on “Edit”.

6. Enable “BGP Route Leaking” and choose the IN and OUT filters.
7. Click on “Add”

8. Verify the settings and click on “Apply”

9. Click on “Save”.
10. Now the Inter VRF Routing is set up on the Internet VRF.
11. Click on “Close”.

12. Do the same for the Customer 2 VRF. Follow the same steps for Customer 2 VRF.

13. Now we have the Internet VRF side ready we need to configure it also on the Customer VRF’s. Follow the same steps only in this case the “Connected Gateway” is the Internet VRF and not the Customer VRF
14. Go to the Customer 1 VRF and click on “Edit”
15. Choose “Inter VRF Routing” and follow the same steps to create the Inter VRF Routing.
16. Repeat the same steps for Customer 2 VRF.

Validation

After all Inter VRF Routing have been configured we can validate everything. These are two steps:

1. First we can check the Inter VRF Routing on the Interfaces of the VRF’s. You will see a extra Interface named: INTERVRF
2. After we have advertised the /27 and /28 from the Customer VRF’s we can check on the Internet VRF if the Routes are also visible in the Routing Table.

Awid Dashtgoli
Awid Dashtgoli

Fix alarm “Registration of third-party IO filter storage providers fails”

Table of Contents

Introduction

In this blog we will show to fix the alarm “Registration of third-party IO filter storage providers fails” on ESXI hosts in VMware vCenter.

Issue

If we go to a cluster within VMware vCenter we will see the following alarm: “Registration/unregistration of third-party IO filter storage provider fails on a host.”.
When we go further in to the alarm we will need to go to “Monitor” and “Triggered Alarms”.

Understanding the Cause

This alarm is caused by the IO Filter on the ESXi host is not correctly registered with the ESXi host certificate.
To get a better understanding we can look in to the logs.

The logs can be found under /var/log/iofiltervpd.log or else /var/run/log/ in the log bundle.

We will look for something like:
iofiltervpd[67088]: run:159:SSL Connection error 30 : SSL_ERROR_SSL error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown

Solution

To solve this issue we need to refresh the IO Filter Registration on the ESXi Host, but before we can do this let’s first verify if the Host Certificate and Certificate chain is OK on the ESXi Host.

Verify Certificates on ESXi Host

To verify if the Certificate and Certificate chain of the ESXi Host is OK we need to follow these steps:

1. Login to vCenter
2. Click on the ESXi host with the alarm
3. Click on “Configure”
4. Scroll down to under “System” you will find “Certificate”. Click on it.
5. Now you will see the “Status”. This should be “Good”

Enable SSH on ESXi Host (Optional)

1. Go to vCenter and login.

2. Choose the host and go to “Configure” -> “Services”. Find “SSH” and click on “START”

Refreshing the IO Filter Registration

After we have validated the Certificate and Certificate Chain on the ESXi Host we can proceed with refreshing the IO Filter Registration.

1. Open a SSH session to your ESXi Host.
2. Run the following command:

					/usr/lib/vmware/iofilter/bin/iofvp-ctrl-app -r
				

3. After running the command the alarm should be resolved. If this is not the case you can click “Reset to Green”. The alarm should not return anymore.

Awid Dashtgoli
Awid Dashtgoli

Fix Cloud Director 10.6 upgrade fails with error Failed dependencies

Table of Contents

Introduction

In this short blog we will show to how to fix a error when upgrading to Cloud Director 10.6.

Issue

When upgrading from Cloud Director version 10.4 or 10.5 to version 10.6 you can face a error with the following message: “Failed dependencies”. This issue prevent Cloud Director from upgrading.

If we look at /opt/vmware/var/log/vami/updatecli.log you will see the following:

error: Failed dependencies:
        libcrypto.so.1.0.0()(64bit) is needed by (installed) xml-security-c-1.7.3-4.ph2.x86_64
        libssl.so.1.0.0()(64bit) is needed by (installed) xml-security-c-1.7.3-4.ph2.x86_64
05/07/2024 05:56:45 [ERROR] Failed with exit code 65024
05/07/2024 05:56:45 [INFO] Update status: Running post-install scripts
Failed with status of 2 while installing version 10.6.0.11510

Understanding the Cause

This issue is caused by  a earlier upgrade of Photon OS upgrades that are not properly cleaned up.

In Cloud Director 10.6 the version of Photon OS is 4.0, while in Cloud Director 10.4/10.5 version 3.0 of Photon OS is used.

Solution

To solve this issue we first need to find the dependencies mentioned in the updatecli.log. After verification we need to delete the dependency and rerun the upgrade.

Warning! Before starting with the solution, always make sure to make a snapshot of all Cloud Director cells!

Verify package exists on Cloud Director cells

Let’s start with the verification of the “xml-security-c-1.7.3-4.ph2.x86_64″ file.

1. Open a SSH session to all Cloud Director cells and login with the root account.
2. Login with the root account.
3. To verify if the package is present run the following command:

					rpm -qa | grep xml-security-c-1.7.3-4.ph2.x86_64
				

4. You will see a similar output to this:

5. Run this command on every Cloud Director cell.

Remove the package from Cloud Director cells

Now we have verified the package is present we can start removing the package from all Cloud Director cells.

1. To remove the package run the following command:

					rpm -e xml-security-c-1.7.3-4.ph2.x86_64
				

2. Run this command on every Cloud Director cell.
3. Last part let’s verify is the package is not present anymore by running the following command:

					rpm -qa | grep xml-security-c-1.7.3-4.ph2.x86_64
				

4. The should be no packages named xml-security-c-1.7.3-4.ph2.x86_64″ anymore.
5. Now we can start the upgrade of Cloud Director 10.6 without any issues.
Make sure to remove the snapshots again after the upgrade is complete.

Awid Dashtgoli
Awid Dashtgoli

Fix “Host TPM attestation alarm” VMware vCenter

Table of Contents

Introduction

In this blog we will show to fix the alarm “Host TPM attestation” on ESXI hosts in VMware vCenter.

Issue

If we go to a cluster within VMware vCenter we will see the following alarm: “Host TPM attestation alarm”.
When we go further in to the alarm we will need to go to “Monitor” and “Triggered Alarms”.

Understanding the Cause

When we go further in to the alarm we will need to go to “Monitor” and “Triggered Alarms”. Here you will see all the current alarms. The one we are looking for is also showing here.

Now we click on the arrow to expand the alarm.
Here we will see information about the alarm, but also the cause of the alarm.

In our case the alarm is caused by not having “Secure Boot” enabled.

Solution

Enable Secure Boot on host

First we need to enable secure boot on our host. This can be done in either ILO/IDRAC or via the BIOS.

For the HPE ProLiant DL Series we will use BIOS.
For the DELL PowerEdge R Series we will use IDRAC.

Warning! Before starting with the solution, always make sure to put the hosts via vCenter in maintenance mode!

HPE ProLiant DL Series (BIOS)

To enter the bios of our HPE server we need to access the console via ILO.

1. browse to the ILO of the host and login to the ILO.

2. Look at the information and make sure “Trusted Platform Module” (TPM) is “Present: Enabled” and the “Module Type” is Present (in my case “TPM 2.0”).

3. Click on the console and choose “HTML5 Console” (I am using ILO 6, your interface may vary)

4. Head back to the vCenter to give the host a reboot.

5. Log a reason for the reboot and click on “OK”.

6. Now go back to the ILO and watch the console.

7. Keep a eye on the Console. Once you see the option for “F9” “System Utilities” press “F9”

8. Once the “System Utilities” has started we choose “System Configuration”.

9. Next we go in to “BIOS/Platform Configuration (RBSU)”.

10. Now we choose “Server Security”.

11. Now we have two options.
If on step 2 the “Trusted Platform Module” (TPM) was not on “Present:Enabled”, but on “Present:Disabled” we can enable this under “Trusted Platform Module Options”.

If on step 2 the “Trusted Platform Module” (TPM) was on “Present:Enabled” select “Secure Boot Settings” and go to step 14.

12. Enable the TPM and click “F10: Save”.

13. Next we will enable “Secure Boot”. Go to “Secure Boot Settings”.

14. Now click on the “Attempt Secure Boot” option and choose “Enabled”.

15. You will get the following message. Click on “OK”

16. Now you will see “Reboot Required”. Click on “Exit”.

17. Choose “F12: Save and Exit”.

18. You will get a message. Click on “OK”.

19. You will get a reboot message. Click on “Reboot”.

20. Go back to vCenter and wait until the server has rebooted.

DELL PowerEdge R Series (IDRAC)

In this part we will change the “Secure Boot” via IDRAC.

1. browse to the IDRAC of the host and login to the IDRAC.

2. Go to “Configuration” and choose “BIOS Settings”.

3. Now go to “System Security”.

4. Make sure “TPM Security” is “On”.

5. Now change “Secure Boot” to “Enabled” and click on “Apply”.

6. Now click on “At Next Reboot”.

7. Now we will go back to vCenter and reboot the host.

8. Log a reason for the reboot and click on “OK”.

9. The host will reboot and you will see the BIOS changes through IDRAC.

Enable Secure Boot on ESXi OS

In this part we will enable “Secure Boot” on the ESXi host. First we need to enable SSH to connect to the host. After that we can enable “Secure Boot”.

Enable SSH on ESXi Host

1. Go to vCenter and login.

2. Choose the host and go to “Configure” -> “Services”. Find “SSH” and click on “START”

Enable Secure Boot through CLI

1. Start a SSH session to the ESXi Host.

2. List the current settings by running:

					esxcli system settings encryption get
				

3. You will get a result like this:

					   Mode: TPM
   Require Executables Only From Installed VIBs: false
   Require Secure Boot: false
				

4. If “Mode” appears as “NONE” run the following command:

					esxcli system settings encryption set --mode=TPM
				

5. Now enable “Secure Boot” by running:

					esxcli system settings encryption set --require-secure-boot=T
				

6. Verify the change and Confirm that “Required Secure Boot” displays true by running:

					esxcli system settings encryption get
				

7. You will get a result like this:

					   Mode: TPM
   Require Executables Only From Installed VIBs: false
   Require Secure Boot: true
				

8. Now we need to save the settings by running:

					/bin/backup.sh 0
				

9. Reboot the server to make sure settings are applied and in place. Run the command:

					reboot
				
Awid Dashtgoli
Awid Dashtgoli

Fix “Index: 0, Size: 0” VMware Cloud Director Error

Table of Contents

Introduction

In this blog we will show to fix the error “Index:0, Size: 0” issue in VMware Cloud Director. This issue will prevent users to make changes on the Organization Tenant VM’s.

Issue

If we go to a organization within VMware Cloud Director and we want to make changes (cpu, ram, hard disks or anything else) to a VM we will receive a error “[ f14cfa58-9afa-4730-98ce-0f69347775f9 ] Index 0 out of bounds for length 0”.

Solution

Warning! Before starting with the solution, always make sure to make a snapshot of all the VMware Cloud Director Cells!

Obtaining "dstore_moref" information from impacted VM's

First we need to obtain all the impacted vm’s. We can do this by accessing the Cloud Director database tables. Impacted vm’s will have a empty “dstore_moref” column.

1. Start a SSH session to the Primary Cloud Director Cell.

2. Switch to the “Postgres” user by running:

					su - postgres
				

3. Start Postgres and open the “vcloud” table by running:

					psql vcloud
				

4. Obtain all the vm’s that have a empty “dstore_moref” value by running:
You will get a list of all the vm’s with there associated id.

					select vm.id,vm.name,datastore_inv.moref  FROM vm
inner join vapp_vm on vapp_vm.svm_id = vm.id
inner join vm_inv on vm_inv.moref = vm.moref
inner join datastore_inv on datastore_inv.vc_display_name = (substring(vm.location_path,2,(POSITION(']' in vm.location_path))-2))
where vm.dstore_moref is NULL and vm_inv.is_deleted is false;
				

5. Now we have a complete list of all the impacted vm’s with their id, name and moref datastore.

Here is a output example:

					
                  id                  |                       name                       |      moref
--------------------------------------+--------------------------------------------------+-----------------
 d6259566a-9826-8845-98b9-9a0b445b803c | dash-demovm-m6g4               | datastore-1935
				

Fixing the issue on a per-VM basis.

In this part we will fix the issue on a per-vm basis. This can be useful in cases where a faulty outcome is not tolerated or if you want to test the results for the first time.

1. Start a SSH session to the Primary Cloud Director Cell.

2. Switch to the “Postgres” user by running:

					su - postgres
				

3. Start Postgres and open the “vcloud” table by running:

					psql vcloud
				

4. If your list of impacted vm’s from the previous step is to big to find your vm, you can filter one specific vm by running:
Replace the ‘%dash-demo%’ with your vm name.

					select 'update vm set dstore_moref = ' || '''' || datastore_inv.moref || '''' || ' where id = ' || '''' || vm.id || '''' || ';' from vm
inner join vapp_vm on vapp_vm.svm_id = vm.id
inner join vm_inv on vm_inv.moref = vm.moref
inner join datastore_inv on datastore_inv.vc_display_name = (substring(vm.location_path,2,(POSITION(']' in vm.location_path))-2))
where vm.dstore_moref is NULL and vm_inv.is_deleted is false and vm.name like '%dash-demo%';
				

5. Now we receive the command to implement the fix.
It should look something like this:

					update vm set dstore_moref = 'datastore-1935' where id = 'd6259566a-9826-8845-98b9-9a0b445b803c';
				

An alternative method is to run the following command:
This will allow you to create the command manually.
Replace the ‘%dash-demo%’ with your vm name.

					select vm.id,vm.name,datastore_inv.moref  FROM vm
inner join vapp_vm on vapp_vm.svm_id = vm.id
inner join vm_inv on vm_inv.moref = vm.moref
inner join datastore_inv on datastore_inv.vc_display_name = (substring(vm.location_path,2,(POSITION(']' in vm.location_path))-2))
where vm.dstore_moref is NULL and vm_inv.is_deleted is false and vm.name like  '%dash-demo%';
				

Now we have all the information we can create the fix command:
Copy the datastore name from the previous result and replace the ‘datastore-1935’ below.
Copy the id from the previous result and replace the ‘d6259566a-9826-8845-98b9-9a0b445b803c’ below.

					update vm set dstore_moref = 'datastore-1935' where id = 'd6259566a-9826-8845-98b9-9a0b445b803c';
				

6. Copy the command and paste it in the command line.

7. Run this command.

8. You will get a response like “UPDATE 1”. This means there is 1 update performed in the database.

9. Go back to Cloud Director Organization and try to change cpu, ram or hard disk settings on the vm.

10. Everything should update fine now without any errors.

Fixing the issue for all VM's at once.

In this part we will fix the issue for all vm’s at once. After you have tested one vm and it is safe to perform this on all the vm’s you can use this guide.

1. Start a SSH session to the Primary Cloud Director Cell.

2. Switch to the “Postgres” user by running:

					su - postgres
				

3. Start Postgres and open the “vcloud” table by running:

					psql vcloud
				

4. Get all the commands at once by running:

					select 'update vm set dstore_moref = ' || '''' || datastore_inv.moref || '''' || ' where id = ' || '''' || vm.id || '''' || ';' from vm
inner join vapp_vm on vapp_vm.svm_id = vm.id
inner join vm_inv on vm_inv.moref = vm.moref
inner join datastore_inv on datastore_inv.vc_display_name = (substring(vm.location_path,2,(POSITION(']' in vm.location_path))-2))
where vm.dstore_moref is NULL and vm_inv.is_deleted is false;
				

5. Now we receive all the commands to implement the fix for all vm’s at once.

					update vm set dstore_moref = 'datastore-1935' where id = 'd6259566a-9826-8845-98b9-9a0b445b803c';
update vm set dstore_moref = 'datastore-1935' where id = 'd6259566a-9826-8845-98b9-9a0b445b803c';
update vm set dstore_moref = 'datastore-1935' where id = 'd6259566a-9826-8845-98b9-9a0b445b803c';
				

6. Copy the complete output.

7. Run this commands.

8. You will get a response like “UPDATE” with a number behind it. The number represents the number of updates, that have been performed in the database.

9. Go back to Cloud Director Organization and try to change cpu, ram or hard disk settings on one of the vm’s.

10. Everything should update fine now without any errors.

Awid Dashtgoli
Awid Dashtgoli

Commission & Decommission a host with VMware Cloud Foundation

Table of Contents

Introduction

In this blog we will show how commission and decommission hosts with VMware Cloud Foundation.

Commissioning a host

In this section we start with commissioning a host. With host commissioning we need to have our ESXI host prepared with the correct ESXI version and config. The setup and configuration should be identical to the current hosts in the VMware Cloud Foundation cluster.

Prerequisites

Before we can commission the new host we need to make sure the host has the correct ESXI version installed. Sometimes the version is not available as a ISO, in that case we need to create a custom ESXI ISO.

I have a blog where I explain step-by-step how to create a custom ESXI ISO from a depot.
Click 
here to see the blog.

Another very important part is to assign the license to the ESXI host before commissioning the host to VMware Cloud Foundation SDDC manager.

1. Login to the ESXI host via the WEB GUI.

2. Go to “Manage” and “Licensing”.

3. Click “Assign License”.

4. Paste the license key and click “Check License”.

5. If you the message “License key is valid for …” you can click “Assign License”.

6. Your license has been assigned.

Add a host to VMware Cloud Foundation

1. Open VMware Cloud Foundation SDDC manager.

2. Under inventory go to “Hosts” and choose “Commission Hosts”.

3. You will see a checklist. Check all the boxes and click “Proceed”

4. Fill in all the fields and click on “Add”.

5. After the host has been added to the list. Check the “Confirm FingerPrint” icon and click on “Validate All”.

6. After the validation is complete you will see the “Validation Status” as “Valid”. Click “Next”.
If you run in to any errors, make sure to check the checklist again and solve the issues.

7. Make sure all the information is correct and click “Commission”.

8. You will see a task under “Tasks”. Wait until the task is finished.

9. After the task is “Successful” you have successfully added the new host to VMware Cloud Foundation. 

Add a host to the cluster of VMware Cloud Foundation

Now we have commissioned the host to VMware Cloud Foundation, we can add the host to a cluster.

Add a host to the cluster of VMware Cloud Foundation via GUI

1. Open VMware Cloud Foundation SDDC manager.

2. Under inventory go to “Workload Domains” and choose the domain you want to add the host to.

3. Click on “Actions” and “Add Host”.

4. You can add the available hosts to the cluster.

5. After you have added the hosts, you will see a task running. Wait until the task is done.

6. Your host has been successfully added to the cluster.

Add a host to the cluster of VMware Cloud Foundation via API Explorer

In some cases it is necessary to use the API Explorer to add hosts to the cluster. This can be the case for example cluster with multiple VDS switches.

First we will need to obtain Host and Cluster ID’s, before we can add the host to the specific cluster.

1. Open VMware Cloud Foundation SDDC manager.

2. Under Developer Center go to “API Explorer”.

3. Open “APIs for managing Hosts” and open the “GET  /v1/hosts”.

4. In the “Status” parameter we need to add a value named “UNASSIGNED_USABLE”.
The “UNASSIGNED_USABLE” will only show the unassigned available hosts.

5. Click on “Execute”. This will run the API query.

6. Under “Response” you will find the output of the API query. Click on “PageOfHost”, here you will see the available/useable hosts. Click on the host.
You will see all the information about the host. The “ID” is the important part for us.
Save the ID after “ID of the host”, it will look something like “d6259566a-9826-8845-98b9-9a0b445b803c”.

7. Now we will obtain the cluster ID.
Under “APIs for managing Clusters” choose “GET  /v1/clusters”.

8. Click on “Execute”.

9. Click on “Execute”.

10. Under “Response” you will find the output of the API query. Click on “PageOfCluster”, here you will see all the clusters. Click on the cluster you want to add your host to.

You will see all the information about the cluster. The “ID” is the important part for us.
Save the ID after “ID of the cluster”, it will look something like
“d6259566a-9826-8845-98b9-9a0b445b803c”.

11. Now we have all the information we can validate our configuration.
Under “APIs for managing Clusters” choose “POST  /v1/clusters/{id}/validations”.
For the “id” parameter fill in the cluster id copied from the previous step.

12. Now for the “clusterUpdateSpec” we need to create a JSON.
Below you will find a example of the JSON:

					{
    "clusterExpansionSpec": {
      "hostSpecs": [
        {
          "id": "d6259566a-9826-8845-98b9-9a0b445b803c",
          "licensekey": "00000-00000-00000-00000-00000",
          "hostNetworkSpec": {
            "vmNics": [
              {
                "id": "vmnic0",
                "vdsName": "DASH-M-vds01",
                "moveToNvds": false
              },
              {
                "id": "vmnic1",
                "vdsName": "DASH-M-vds02",
                "moveToNvds": false
              },
              {
                "id": "vmnic2",
                "vdsName": "DASH-M-vds01",
                "moveToNvds": false
              },
              {
                "id": "vmnic3",
                "vdsName": "DASH-M-vds02",
                "moveToNvds": false
              }
            ]
          }
        }
      ],
      "interRackExpansion" : false
    }
}
				

13. Copy and paste the JSON under “clusterUpdateSpec” Parameter and click “Execute”.

14. You will get a message “Are you sure?”. Click on “Continue”.

15. Under Response you will see the “Validation”. If you open the “Validation” you will see under “ResultStatus” the message “SUCCEEDED”. This means our JSON is working and the validation is successful.

16. Now we will execute this JSON to start adding the host to the cluster.
Under “APIs for managing Clusters” choose “PATCH  /v1/clusters/{id}”.
For the “id” and “clusterUpdateSpec” parameters fill in the exact same information as the validation step and click on “Execute”.

17. You will again get a message “Are you sure?”. Click on “Continue”.

18. Under Response you will see the “Task”. If you open the “Task” you will see under “status” the message “IN_PROGRESS”. This means the task has been started and the host will be added to the cluster.

19. Under “Tasks” you will see a new task running.

20. Once the task is completed you will see the task with the status “Succesful”.
The host has successfully been added to the cluster.

Decommissioning a host

In this section we decommission a host. The host decommissioning can be done through the GUI.

Prerequisites​

Before we can decommission the host we need to make sure the host is in maintenance mode.

We can perform this on the vCenter WEB GUI.

1. Login to the vCenter Server.

2. Put the host you want to decommission in maintenance mode and choose “Full data migration”.
This will make sure all the data will be moved from the host and redundancy will still be in place for vSAN.

The “Full data migration” can take some time depending on your environment.

Remove a host from VMware Cloud Foundation cluster

1. Open VMware Cloud Foundation SDDC manager.

2. Under inventory go to “Workload Domains” and choose the domain you want to remove the host from.

3. Click on “Hosts” and select the host you want to delete. After selecting the host choose “Remove Selected Hosts”.

4. If necessary check the “Force Remove Host” option. After that click on “Remove”.

5. A task under “Tasks” will start.

6. Once the task is “Successful” the host is removed from the cluster.

7. The host is now in a “Usable” state. This means you can put the host in another VMware Cloud Foundation Cluster if you want.

Decommission a host from VMware Cloud Foundation

To completely remove the host from VMware Cloud Foundation, we need to decommission the host.

1. Open VMware Cloud Foundation SDDC manager.

2. Under inventory go to “Hosts” and choose “Unassigned Hosts”.

3. Select the host you want to decommission and choose “Decommission Selected Hosts”.

4. You will get a message. Select the “Skip failed hosts during decommissioning” and choose “Confirm”.

5. The host will start decommissioning and you will see a task starting under “Tasks”.

6. Once completed the task will show “Successful”.

7. You have successfully decommissioned a host from VMware Cloud Foundation.

Awid Dashtgoli
Awid Dashtgoli

Upgrade VMware NSX Advanced Load balancer

Table of Contents

Introduction

In this blog we will show how to upgrade VMware NSX Advanced Load Balancer.

Preparation

Before we can proceed to the upgrade we first need to prepare everything.

First we need to download the software from the VMware website. After downloading we can upload the software on VMware NSX Advanced Load Balancer.

Upload the Software

1. Login to the VMware NSX Advanced Load Balancer UI.

2. Access the NSX Advanced Load Balancer interface and go to Administration > Controller > Software. Select “Upload From Computer” to transfer the NSX Advanced Load Balancer software to the Controller.

3. After selecting the file, the software image upload begins. The progress of the upload is shown on the UI.

4. Once the image upload is finished, the software package will be available under the “Software” tab.

Upgrade the NSX Advanced Load Balancer System

1. Login to the VMware NSX Advanced Load Balancer UI.

2. Go to Administration > Controller > System Update, choose the image uploaded in the previous step, and click “UPGRADE” to initiate the upgrade process.

3. Check the “Upgrade All Service Engine Groups” box to update the SE groups along with the Controller upgrade. Click “Continue” to proceed with the software upgrades for the Controller and SE groups.

4. The next screen will appear for final checks before proceeding with the upgrade.

5. When prompted, confirm that a configuration backup has been completed.

6. The update process progress can be viewed on the UI in the “In Progress” section.

7. After the upgrade process is finished, the latest software version will be listed under Administration > Controller > System Update. The current tag will be shown next to the updated software version.

8. After the Controller upgrade is successful, the Service Engine group upgrade will begin, as it was selected in step 2. The following screenshot displays the message regarding the SE group update status.

9. After the SE group update is successful, the upgrade status changes to “successful,” as illustrated below.

Whenever an SE group is upgraded with the “Action to take on SEG update failure” on “Suspend” and an issue is encountered, the upgrade process for that SE group is suspended. After the issue is resolved through manual intervention, use the following command to resume the upgrade:

					resume segroup se_group_refs <se-group-name>
				

Replace <se-group-name> with your Service Engine Group that is suspended.

Awid Dashtgoli
Awid Dashtgoli