VMware vCenter Server Heartbeat – Restore on a second node….a journey….

Warning: The following post has absolutely nothing to do with automation or Orchestration! Be aware of this while reading the post 😉

In the last days I had a job for a customer who wanted to implement vCenter Server Heartbeat. For those of you, which are not familiar with vCSHB here is the description from the VMware Website:

“VMware® vCenter™ Server Heartbeat™ protects VMware® vCenter Server™ virtual infrastructure from problems related to applications, configurations, operating systems, networks and hardware.”

If I had to say it I would bring it to the words: vCSHB is a cluster service for different VMware products like the vCenter Server (including the SQL Server if installed on the same Server), VMware Composer and a lot more…..

As picture vCSHB looks like this:

The vCSHB is based on Neverfail with tuning for VMware…… With vCSHB you can build different “models” to protect your vCenter Server. This could be a physical to virtual model or also a physical to physical model. There are many amore options to build your application HA with vCSHB….. Most implementations that I have done so far, were physical to virtual implementations….this model is relative simple to implement and you get the vCHSB easily up and running…..

Now let’s come back to the customer project…. The customer wanted to implement a physical to physical implementation. The Servers where located on a remote site so that’s mean for as that we didn’t have a chance to get a hand on the server. The only possibility for us to work with the server is to log in to the Windows system which was installed on the server via RDP or use the remote connection board.

For the customer we started with the primary server. First thing you have to do, beside the installation of Windows, is the installation of the VMware vCenter Server, a SQL Server and the other VMware Products you want to protect via vCSHB. After you are finished with the installation, you can start with the installation of vCHB. We also did it in that way and everything did it fine. During the vCSHB installation, a Backup of the Primary system ( the system were you installed the vCSHB first) is taken. After the first node is installed, it is necessary to install vCSHB on the second note. The only “prerequisite” on the second node is that you need an installed Windows with the Windows Backup installed. The wording installation is properly wrong in this context because you start the installation and then you use the Backup of the first node.

Here began the Journey with the installation…….

On our first try, the installation failed with an error. The Log provided these information:

Log of files for which recovery failed:

C:\Windows\Logs\WindowsServerBackup\FileRestore_Error-08-11-2013_13-16-29.log

wbadmin 1.0 – Backup command-line tool

(C) Copyright 2012 Microsoft Corporation. All rights reserved.

Starting a system state recovery operation [08.11.2013 14:17].

Processing files for recovery. This might take a few minutes…

Processed (176) files.

Processed (1657) files.

Processed (19373) files.

Processed (27841) files.

Processed (47849) files.

Processed (53873) files.

Processed (74813) files.

Processed (97831) files.

Processed (120041) files.

Processed (120041) files.

Processed (120041) files.

Summary of the recovery operation:

——————–

The recovery of the system state failed [29.11.2013 14:19].

Log of files successfully recovered:

C:\Windows\Logs\WindowsServerBackup\SystemStateRestore-29-11-2013_13-17-40.log

Log of files for which recovery failed: C:\Windows\Logs\WindowsServerBackup\SystemStateRestore_Error-29-11-2013_13-17-40.log

Access is denied.

An exception occurred:

Message: Execution failed with return code -3. Restore was aborted.

See vCSHB-Ref-2273 for resolution procedure.

At:

NTBackupRestoreThread::Run()

wbadmin start systemstaterecovery -backupTarget:”\\Server\share” -version:29/08/2013-11:51 -quiet

The installation cannot continue.

The most interesting Part of the Log was the pointing to the vCSHB reference 2273 for resolution procedure. The Links to an VMware KB Article which can be found here:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004359

Depending on the Information provided in the KB article and the Batch file, we tried a second restore. Also this second restore ends up in the same error message……after a lot of searching we didn’t find any suitable answer for our problem……so we still had to try to solve the problem….

The next we tried was to reinstall the second node and started the Batch file before we tried a next restore…..again we didn’t end in a successful story…..

So we decided to open a VMware SR. We explained our problem and the things we did before…..sadly the VMware Support could help us so much because his only possible solution was the KB article. These was the answer from the support…..

If the steps in the KB do not work then there is no other workaround that we have the problem here is heartbeat using the underlining MS technology to do the backup and clone so if this is failing there is nothing I can do from the VMware side Also he couldn’t help us directly, we could use the information “underlining MS technology”……

So again we started to search but this time we focused on the Microsoft topics for Backup and restore. In the Microsoft forums we found some information from other users which had problems to restore a backup on a physical node witch was made with MS Backup technology. The could restore there nodes if they:

– Deactivated the public (primary) Network Connection

– Didn’t use RDP to connect to the server

– Use the private (second) network connection to access the Backup file

We gave these tips a chance and could install the Backup on the second node……yes 🙂 ….. After that restore was successfully the rest was easy going and we could implement, configure and test the heartbeat installation.

From a discussion with a other vExpert, Mike Schubert, I know that it is possible to restore the vCSHB also with “RAID” Copy (take a Disk from the first node and place it in a second node….)with some manual work…. Hope you can use this information if you get in trouble during your Heartbeat installation….so habe fun a Orchestrate the World 😉

———————————————————————-

Update 09.02.1014

For the German speaking people there is another way provided by Mike Schubert to restore the Heartbeat installation on the second node. Mike’s article can be found here: http://www.die-schubis.de/doku.php/vmware:heartbeat#physical_to_physical_-_ein_anderer_weg

Mike way is to “copy” the Installation via a raid copy with some additional manual task.