File locks on vSphere

vmware_vSphere7_graphic_thumbIn a project we where starting an action to export a VM to an OVF file. We launched the export from the vSphere WebClient. During the export action the file stream failed and we cancelled the export action. Because we where running into a time restriction with the allowed time frame we wanted to boot the machine so it would be available to the end users.

Once we booted the machine we received the following error: File system specific implementation of Ioctl[file] failed. This resulted in not being able to boot the machine.

Together with VMware Support we where able to resolve this issue by identifying what was locking this VM files, and remove the lock.

2021-09-20 13_18_56-2021-09-19 21_07_28-admpqr_rplo01@ads-pmpbeh-v01.ads.net - PMP RDP SESSION — Moz

With the command lsof | grep vm name we get a list of the current locks on the files for this VM. In the above screenshot you can see that the vpxa process is still locking the files. The process that is locking the VM files (vpxa) is the vCenter Agent Services.

With the kill –9 67866
The –9  switch means, the process will be killed by the kernel. This command cannot be ignored by the OS.

After the kill command we run the lsof command to be sure that the process is gone. We are now able to boot the machine.

Reset VMware ESXi root password

vmware_vSphere7_graphicThe root account is the only login account to vSphere ESXi. There is no extra account to create a backdoor to logon to vSphere ESXi when the root password is lost. When a vSphere ESXi host is added to a vCenter instance, management of the host is primary done via vCenter. Troubleshooting ESXi is done primarily on the command line via an SSH connection. By default the SSH service is stopped. To start the SSH service you have to access the server via vCenter Host>Configure>System>Services. When you don’t have the root password for the vSphere ESXi host you have to follow the following procedure.

This procedure uses the Host profile functionality that is only available when you have an Enterprise license. If you have lost the root password but you don’t have an Enterprise license you have no other option but reinstall the host.

Lees verder

VMworld 2021 Top 10 session to watch

vmworld2021It is that time of the year again to start looking forward to VMworld 2021. Due to the ongoing Covid-19 pandemic VMworld 2021 will again be “fully virtual” again.

The upside to a virtual event is that you don’t need to walk across a big conference complex to get from one session to another. You can follow the conference from the luxury of you own chair and desk. Poor your own drink of choice, sit back and relax and take in all the information on VMware latest and greatest from your own home. Because VMworld 2021 will be fully virtual, like last year that will make it easier for people to attend since you don’t need to arrange travel (flight/hotel) to attend VMworld.

Lees verder

VIBS Error vSphere ESXi upgrade

vmware_vSphere7_graphicRecently I was upgrading vSphere ESXi host from version 6.5.0 (7388607) to version 7.0.1. vCenter for this environment is upgraded to version 7.0.2.0000. at first I was trying to start the upgrade via VMware LifeCycle Manager but that resulted in an error indicating that the vCenter/LifeCycle Manager and the ESXi version where not working well together. In order to make progress I’ve accessed the server via it’s Integrated Lights Out (ILO) interface (HPE). Mounted the HPE ESXi image through ILO and booted the server.

During the upgrade progress the installer finds the drive where ESXi is installed. The next step is that the installer scans the current installation to see if an upgrade is possible. At this point the installer throw’s the following error.

2021-05-31 10_32_55-

Some investigation through the list the installer is showing here it is clear that these VIBs are for storage drivers that are no longer in use by ESXi.

The correct way to resolve these errors is to remove the unused storage drivers from ESXi. The next step is to reboot (F11) the server. When the ESXi  is completely loaded I connect via SSH (I use the MobaXterm client).

2021-06-08 09_49_12-Photos

With the following command we retrieve the name of the package:

esxcli software vib list | grep 4.0.2.1

The output shows that the package is called net-mst.

With the following command we remove this VIB.

esxcli software vib remove –n net-mst

After we remove all the VIBS that are mentioned in the above error, the VMware vSphere ESXi upgrade can be restarted.

Fixing an interrupted NSX-T Manager upgrade

nsxtThe process for upgrading the NSX-T managers in an environment is a automated process that works through three managers and finishes the moment all the NSX-T managers are upgraded to the new desired version. Recently I was upgrading a NSX-T datacenter environment from version 3.1.0.0.017107177 to version 3.1.1.0.0.17483065 in my lab environment. The Edge nodes and Transport Nodes had already been upgraded successfully. While we where in the middle of upgrading the the NSX-T manager upgrade got interrupted and the NSX-T managers rebooted when the upgrade was not yet finished.

After all the nodes where back up again I was not able to logon to the Management environment, the designated Virtual IP (VIP) appeared to be down.  When I connected to the first NSX-T Manager machine I was presented with a message indicated that the upgrade had not fully completed. When I executed the following command at the prompt Get upgrade progress-status I was presented with the following output:

2021-03-27 13_46_50-SSDC-Man-PEC - TeamViewer

The output shows that all the upgrade steps where completed successfully. When I connected to the second NSX-T manager machine I got the same output.

I then connected to the third NSX-T Manager, this one was not completed and caused the other NSX-T managers to remain in the upgrading status and the Management VIP to remain unavailable.

2021-03-27 13_45_22-SSDC-Man-PEC - TeamViewer

I first executed the command to see the available upgrade packages on the NSX-T Manager machine.  get upgrade-bundle playbooks To resume the NSX-T Manager upgrade I executed the following command:  resume upgrade-bundle VMware-NSX-appliance-3.1.1.0.0.17483186 playbook

The upgrade process resumed and completed successfully in a manner of minutes, after which the environment became functional again and Management VIP became accessible again.

image

Advanced Cross vCenter vMotion

vmware_vSphere7_graphicVMware released vSphere version 7.0 U1c – 17327586 in December 2020. Next to the cool new features that is included in this version (This blog is al about one of those cool features) another very important reason to download and install this version of vSphere is that it closes a major security issue with previous versions. You can find more info on this here.

New features in this version of vSphere include the following:

  • Physical NIC statistics
  • Advanced Cross vCenter vMotion
  • Parallel remediation on host in clusters that you manage with vSphere Lifecycle Manager baselines
  • Third-party plug-ins to manage services on the vSAN Data Persistence platform

The VMware release notes have the following to say about this new feature:

With vCenter Server 7.0 Update 1c, in the vSphere Client, you can use the Advanced Cross vCenter vMotion feature to manage the bulk migration of workloads across vCenter Server systems in different vCenter Single Sign-On domains. Advanced Cross vCenter vMotion does not depend on vCenter Enhanced Linked Mode or Hybrid Linked Mode and works for both on-premise and cloud environments. Advanced Cross vCenter vMotion facilitates your migration from VMware Cloud Foundation 3 to VMware Cloud Foundation 4, which includes vSphere with Tanzu Kubernetes Grid, and delivers a unified platform for both VMs and containers, allowing operators to provision Kubernetes clusters from vCenter Server. The feature also allows smooth transition to the latest version of vCenter Server by simplifying workload migration from any vCenter Server instance of 6.x or later.

In this blog we will describe the process of importing VMs form a 6.7 vCenter to the updated 7.0.1 vCenter, making use of the cross vCenter technology. To prepare the environment for cross vCenter vMotion the vMotion network has to be configured with a gateway.

image

At the receiving side we tried to VMKping the sending host over the vMotion VMKernel port. When this failed we added a route to any foreign network across the gateway. When we retried the VMKping it was successful.

On the sending side we also configured the vMotion network with a gateway entry.

image

To start the process of performing a cross vCenter vMotion we right click  on the cluster or ESXi host.

image

Click on Import VMs

image

Select source vCenter

image

Select the VMs you want to move.

image

Select the host to transfer the compute to.

image

Select the destination storage.

image

Select networks.

image

Select vMotion priority.

imageReady to complete, click Finish.

The 7.0.1 environment also makes use of NSX-T network virtualization. Why is this important to mention? If you want to perform a roll back you can’t move a VM that is connected to a NSX-T managed portgroup to a none NSX-T managed portgroup. To remediate this issue you should create a none NSX-T portgroup with the same vLAN and add the VM you want to rollback to that portgroup.

Upgrade NSX-T Edge Nodes

image-1VMware NSX-T delivers virtual networking in a software defined datacenter. In this article we are going to take a look at a VMware NSX-T environment that is ready for upgrading. In this blog we will upgrade the seven NSX-T Edge nodes. Let’s first take a look at what is the function of Edge nodes within the NSX-T architecture. An NSX Edge nodes are service appliances that run centralized network services that cannot be distributed to the hypervisors. An NSX Edge node can belong to one overlay transport zone and multiple vLan transport zones.

Today we are performing an upgrade for the Edge Nodes of a NSX-T environment. We are upgrading 7 Edge Nodes from version 3.1.0.0.017107177 to version 3.1.1.0.0.17483065. Before the upgrade we first preform a pre check of the environment, to make sure it is ready for the upgrade.

2021-03-15 18_47_45-SSDC-Man-PEC - TeamViewer

The above image shows that during the pre check there where 6 NSX-T Edge nodes with issues in the environment that could prevent a successful upgrade. Before we go any further we are going to investigate what those issues are.

2021-03-15 18_53_29-SSDC-Man-PEC - TeamViewer

By clicking on one of the affected NSX-T Edge nodes we can see that this node had two issues.

2021-03-15 18_53_55-SSDC-Man-PEC - TeamViewer2021-03-15 18_54_17-SSDC-Man-PEC - TeamViewer

When we click on the blue two with the exclamation mark next to it we can drill further down to identify the current issue. The two alarms indicate that the password expiration is approaching for both the admin and root account.

2021-03-15 18_59_55-SSDC-Man-PEC - TeamViewer

To remediate this issue we will change the password for the Admin and Root account. To accomplish this task we connect to the NSX-T Edge node as root via SSH and execute the following commands:

  • /etc/init.d/nsx-edge-api-server stop
  • passwd admin
  • passwd root
  • touch /var/vmware nsx/reset_cluster_credentials
  • /etc/init.d/nsx-edge-api-server start

2021-03-15 19_06_13-SSDC-Man-PEC - TeamViewer
The Edge-TN-07 is now without errors, we proceed by checking the other NSX-T Edge nodes and preform the same actions on those nodes.

2021-03-15 19_21_02-SSDC-Man-PEC - TeamViewer

The other NSX-T Edge nodes are now also without errors.

2021-03-15 21_35_40-TraXall – Toegang tot de car configurator_ Robin PLOMP - Message (HTML)

In the upgrade window we select the Edge Node cluster and we start the upgrade.

2021-03-15 22_48_41-SSDC-Man-PEC - TeamViewer

Grab a drink (coffee) and wait for the progress bar to fill up to 100%

2021-03-15 22_50_04-SSDC-Man-PEC - TeamViewer

In the upgrade overview window we can now see that the seven NSX-T Edge nodes are now upgraded.

Awarded vExpert 2021

vExpert 2021VMware vExpert is an honorary title VMware grants to outstanding advocates of the company’s products.

The vExpert title is held in high regards within the community due to the expertise of the selected vExperts. The vExpert honorees are sharing their knowledge towards enabling and empowering customers around the world with VMware’s software defined hybrid cloud technology adoption.

The vExpert award is for individuals, not for companies. The title last for one year. Employees of both customers and partners can receive the vExpert award. VMware started the vExpert program in 2009.

I am honored, happy and very proud that I am named vExpert 2021. I look forward to participate in the vExpert program and to continue to share knowledge about the VMware products and their different use cases.

vSAN Hybrid / All Flash

vsan-est-2013As a VMware partner we (my employer PQR) conducts VMware Health Checks. To perform a Health Check on a vSphere (or EUC, NSX-T) environment VMware provides a tool to check if the environment matches the VMware best practices. The tool to check if the environment matches the VMware best practices is called the VMware Health Analyzer. The VMware Health Analyzer is a Photon appliance that you install in the client environment. There is also a Windows installed version of the VMware Health Analyzer. My preference is to use the appliance version. I have the appliance also running on my environment, so if I collected data at a customer site I can load this information in my own appliance, this means that I don’t need a connection with the customer to create my Health Check report. Current version of the VMware Health Analyzer is: 5.5.2.0. Next to the VMware Health Analyzer the consultant checking the VMware environment will also use his own knowledge to check the environment and to interpret the data presented by the VMware Health Check Analyzer.

VMware Health Analyzer
image
Above screenshot is from a lab environment.

Recently we did a Health Check on a vSphere 6.7 environment for a large company. The environment consists of six vSphere host with a single vSAN cluster. Before the Health Check the customer decided to expand the environment with four extra host. The original vSAN cluster over consisting of those six vSphere servers is a Hybrid vSAN, the Diskgroups on the four new servers are all flash. This situation has resulted in a combined vSAN with Hybrid and All Flash Diskgroups. This setup is not supported by VMware. When we investigate the servers of the Hybrid vSAN we noticed that the disks in the servers are also all flash, but marked as HDD.

Disk group “Hybrid” servers

image

Disk Group All Flash servers

image

For performance purposes we highly recommend to use an All Flash vSAN instead of an Hybrid vSAN.

Advantages of an All Flash vSAN:

  1. Make use of space efficiency: Deduplication and compression;
  2. Provide organizations with the ability to run business critical applications and OLTP databases using vSAN enabled by fast, predicable throughput and lower latency;
  3. Give customers the ability to scale and support a significantly larger number of VMs and virtual desktops using the same compute and network resources;
  4. Increase business agility and productivity by enabling IT to provision services faster, increasing user satisfaction and executing on faster backup and disaster recovery for production deployments;
  5. Combine the benefits of vSAN and flash to deliver a lower TCO using less power, cooling, data center floor space and other resources per virtual machine, virtual desktop or transaction;
  6. While data de-staging happens from cache to capacity, flushing of data would happen far faster in all-flash vSAN in comparison to a hybrid (HDD + SSD) vSAN, helping define better SLA.

Converting the disk groups and converting the vSAN from hybrid to all flash has a large impact and must be well prepared before executed.
We proposed the following method.

  1. Remove three “new” servers from the current vSAN cluster;
  2. Build a new All Flash vSAN Cluster with these three servers;
  3. Add the new vSAN cluster to the VMware Horizon environment;
  4. Empty the remaining 7 servers one by one, and add them to the new All Flash vSAN.
  5. If the old cluster is empty, delete it.

Thanks to Ronald de Jong

vSphere Cluster Services (vCLS)

vmware_vSphere7_graphic_small1In vSphere 7.0 Update 1 (released in October 2020) a new feature was released called vSphere Cluster Services (vCLS). The purpose of vCLS is to ensure that cluster services, such as vSphere DRS and vSphere HA) are available to maintain the resources and health of the workload’s running the cluster. vCLS is independent of the vCenter Server availability.

vCLS uses agent virtual machines to maintain cluster services health. vCLS run in every cluster, even when cluster services like vSphere DRS and vSphere HA are not enabled.

The architecture of the vCLS control plane consists of max 3 virtual machines, also called system or agent VMs. The vCLS machines are placed on sperate hosts in a cluster. On a smaller environment (less than 3 host) the number of vCLS VMs will be equal to the number of hosts. SDDC (Software Defined Datacenter) admin’s do not need to maintain the life cycle of these vCLS VMs.

The architecture for the vSphere Cluster Services is displayed in this image.

2021-01-06 17_21_40-vSphere 7 Update 1 - vSphere Clustering Service (vCLS) - VMware vSphere Blog and

The vCLS VMs that form the cluster quorum state, are self correcting. This means that when the vCLS VMs are not available the vSphere Cluster Services will try to create, update or power-on the vCLS VMs automatically.

2021-02-02 21_37_33-192.168.0.103 - Remote Desktop Connection_small

There are three health states for the cluster services:

  • Healthy: The vSphere Cluster Services heath is green when at least one vCLS VM is running in the cluster. To maintain vCLS VM availability, there’s a cluster quorum of three vCLS VMs deployed.

  • Degraded: This is a transient state when at least one of the vCLS VMs is not available, but DRS maintains functionality. The cluster could also be in this state when either vCLS VMs are being re-deployed or getting powered-on after some impact to the running vCLS VMs.

  • Unhealthy: A vCLS unhealthy state happens when DRS loses it’s functionality due to the vCLS Control plane not being available.

The vCLS VMs are automatically places in there own folder within the cluster.

2021-02-02 21_50_17-192.168.0.103 - Remote Desktop Connection_small

The vCLS VMs are small, with minimum resources. If no shared storage is available the vCLS VMs are created on local storage. If a cluster is created before shared storage is configured on the ESXi host (for instance vSAN), it would be strongly recommended to move the vCLS VMs to the shared storage once it is created.

The vCLS VMs are running a customized Photon OS. In the image below you see the resources of a vCLS VM.

2021-02-02 21_50_59-192.168.0.103 - Remote Desktop Connection_small

The two GB virtual disk is thin provisioned. The vCLS VM has no NIC, it does not need one to communicate because vCLS leverages a VMCI/vSocket interface to communicate with the hypervisor.

The health of vCLS VMs, including power state, is managed by vSphere ESX Agent Manager (EAM). In case of power on failure of vCLS VMs, or if the first instance of DRS for a cluster is skipped due to lack of quorum of vCLS VMs, a banner appears in the cluster summary page along with a link to a Knowledge Base article to help troubleshoot the error state. Because vCLS VMs are treated as system VMs, you do not need to backup or snapshot these VMs. The health state of these VMs is managed by vCenter services.

Tags: VMware, vSphere, vCLS