The Dream of a self-healing Datacenter: Integrate vCenter Operations (vCOPS) with Orchestrator

Workflows in vCenter Orchestrator allow you to automate tasks in vCenter. That’s “Kindergarten“.

Workflows also allow you to orchestrate IT-Services all over the infrasctructure, leveraging all that generic or specific plugins (Wanna see the list? Go to VMware’s Solution Exchange: https://solutionexchange.vmware.com/store/category_groups/15/categories/21).
That’s the “Advanced to Master“-Level.

Kicking-off “healing” workflows based on “unhealthy” conditions in your datacenter fully-automated, using vCO as a headless orchestration platform? That sounds like a job for Wizards!
Well, let’s see……

Orchestrator Wizard (bearded)

The Basics

What is VMware vCenter Operations?
Quoting http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-management/overview.html:
“… Automate performance, capacity, and configuration management with patented analytics and an integrated approach to management. Eliminate the finger pointing, improve team collaboration and reduce manual problem solving efforts by as much as 40% with automated root cause analysis… blah blah Marketing blah 🙄 ”

In short: It’s great! 

VMware vCenter Operations (vCOPS)


What do we need in vCenter Orchestrator?
Just some basic facts
: Orchestrator Workflows are triggered mostly in one of these three ways:

  • Manually: By an administrator (using the vCO-Client), or an Helpdesk-Agent (using the weboperator Webview, or the Perspectives-Plugin) or  a customer/end-user (via a self-service portal (built in Wavemaker of course  ;-))
  • As Scheduled Task in vCO: The workflow will be started by the vCO server on a regular base, e.g. for reporting, checking for exhausting snapshots, …… (It’s even possible to schedule a workflow programmatically, see a nice example here: http://communities.vmware.com/thread/318791)
  • By an external System: Another component of your Infrastructure kicks-off the workflow via the API. This could be an high-level Business-Process engine (like VMware Service Manager), a system to manage classroom-environments, a release management system, an I/S/P/X/Y/Z-aaS Manager, whatever…

And there is a fourth, not so well known way to start a workflow:

Policies in Orchestrator to start Workflows on signals

  • Event based: The workflow is started when a specific event occurs . You can either use a “Waiting event”-Element in the workflow, or you can create a Policy in vCO. For both vCO-Plugins can provide Triggers.
    Examples:  The AMQP-Plugin (which can be triggered e.g by a Blocking Task in vCloud Director), or the SNMP-Plugin (listens to SNMP-Traps, sent e.g. from vCenter or other systems…)
    Waiting Event in a workflow

Subtotal:

vCenter Operations (vCOPS) can send SNMP-Traps when unwanted situations occur.

vCO can start workflows when an SNMP-Trap is received.

So: Integrate (again)!

Step 1:  Install the SNMP-Plugin on your vCO-Server. RTFM on https://www.vmware.com/support/pubs/vco_plugins_pubs.html

Step 2: Run the Workflows Library / Trap Host Management / Set the SNMP trap port and Start the trap host to make vCO (exactly: the SNMP-Plugin) listen to SNMP traps.

Step 3: Run the Workflow Library / Device Management / Register an SNMP device and make vCO listen to SNMP traps exactly from vCOPS. Make sure you use the hostname / IP of the vCOPs Analytics VM, not the UI VM!

Run Workflows to setup the environment

Step 4: Configure vCOPS to send SNMP-Traps to the Orchestrator server. Open the vCOPS Manager Administration, and define your vCO-Server as (receiving) Host.

Enable SNMP notifications in vCenter Operations

Step 5: Configure vCOPS  to “activate” alerting: Open the menu “Notifications” in vCOPs usual webinterface (not the Administration anymore), and create a “New Rule” and enable all the Conditions you want to be notified on.
(I’m not sure if this step is really necessary,  I didn’t find any information if these rules are for email only, or if they are also used for SNMP notifications 😕 )

Create Rules to enable alerting

Step 6: For a first test: Run the Workflow Library / SNMP Wait for a trap on an SNMP device. Select the vCOPS Analytics VM as SNMP Device, you registered it in Step 3!
The Workflow should stall at the “Waiting Element” for an SNMP trap from vCOPS. Now break something in your datacenter 😈 , so that vCOPS sends a trap…

Waiting for something bad...

Once the trap is received by vCO, the workflow should continue and finish successfully. In the Logs-Tab of the Workflow-Token you can find some details about the trap.

Step 7To be the wizard: Create a new Policy based on the Policy Template which comes with the SNMP Plugin.
Apply a new  Orchestrator Policy

Then edit the created Policy to select the Workflow to be started when a trap is received:

Step 8: Start the Policy! (That’s something you will forget only once, after hours of senseless troubleshooting…)

DON'T FORGET TO START THE POLICY!

Finito!

Now every condition in your datacenter which leads to an alerting in vCenter Operations will trigger a Workflow in Orchestrator, which can deal with the problem. And the best: All that without writing a single line of code!

From there I leave it to you…

<site note>
Integrating a Systems Monitoring tool with vCenter Orchestrator is not a new:
up.time software provides a vCO plugin for years:
http://support.uptimesoftware.com/orchestrator.php
http://www.uptimesoftware.com/uptimeblog/uncategorized/cloud-computing-and-popular-culture/
</site note>

The Dream of a self-healing Datacenter

Proofing that it’s possible, consider what you can do (or better: what you can let do fully-automated) with an integration of vCenter Operations and vCenter Orchestrator:

  • Run a workflow that automatically creates a Ticket or updates the CMDB on vCOPS alerts
  • A datastore runs out of capacity? A workflow will fix it automatically, increase the LUN on the storage system and the VMFS partition. Or just full-stack provision a new datastore and add it to the storage cluster.
  • YourMightyStorageVendor(tm) provides an additional workflow: If now your storage system runs out of capacity, new disks will automatically be ordered by the workflow 🙂
  • Response time for Outlook users to high? A workflow will deploy new Exchange CAS instances to scale-out (and leverage the Powershell-Plugin to adjust it settings)
  • Your web-app is getting “Slashdotted”? See how Radware already leverages vCO to scale it fully-automated: http://www.youtube.com/watch?v=4rkV3ebQens
  • “No worries, Captain! I already put 80% energy to the front shields to prepare for the Klingonian attack!”
  • With the brand-new vCenter Operations for VMware View and a View API/Plugin for vCO you could… … …

***RING – RING – YOUR WAKEUP CALL – RING – RING ***

SIGH! What a nice dream…  :mrgreen: