One of the clusters in our environment contains Dell PowerEdge R730 servers. We recently noticed that a few hosts were failing and throwing up an error during the bootup process. One common thing about these hosts was that we had ESXi installed on the internal SD modules. Rebooting the server results in an error message and reinstalling the OS on the SD card still gives the same error.
This is a known issue with ESXi 7.0 installations on SD cards. One Reddit user gave the explanation that “The new partition layout in esxi 7.0 has more writes and the writes aren’t throttled anymore like in earlier releases, therefore VMware also recommends not to use sd cards anymore.”
The official information from VMware on this issue is located in this KB. The issue is triggered by a VMFS-L Locker partition corruption on SD cards in ESXi 7.0.
As of 7.0 Update 1, the format of the ESX-OSData boot data partition has been changed. Instead of using FAT it is using a new format called VMFS-L. This new format allows much more and faster I/O to the partition. This ESX-OSData partition is where frequent data is written and combines the product locker and scratch log partitions which were used in previous versions of ESXi. This partition is more commonly seen as the
VMware has listed one of the main reasons why the /scratch partition fails on ESXi 7.0 as
The level of read and write traffic is overwhelming and corrupting many less capable SD cards.
Finally, as a workaround, the KB says;
The version 7.0 Update 2 VMware ESXi Installation and Setup Guide, page 12, specifically says that the ESX-OSData partition “must be created on high-endurance storage devices”.
Another workaround is (this is actually the only workaround that worked for us);
Once the new drive is installed, and ESXi has been reinstalled, you can immediately move the
/scratchpartition to a location not on the boot drive, per directions in System logs are stored on non-persistent storage