vCO, SNMP Traps and vCOPS

Today I had the request, to generate a vCOPS Demo with an vCO integration. For that, I had a first look on Jörg’s post about is “Self-Healing datacenter” which can be found here: http://www.vcoportal.de/2012/05/integrate-vcops-and-vco/

For me, that was a good starting point but I want to integrate the SNMP Trap with an existing workflow. The creation of a Workflow to deal with snmp traps is not a big problem. The bigger problem is the “automatic” processing via Policy’s because there is no documentation (or I didn’t find it) . So I started to talk with Jörg and made some research in the web.  There I found this excellent post from William Lam about “Automatically Securing Virtual Machines Using vCenter Orchestrator”  http://blogs.vmware.com/vsphere/2012/07/automatically-securing-virtual-machines-using-vcenter-orchestrator.html

This post from William (and his package) has everything inside we need.

First we have to create the preconditions like the SNMP installation in the vCO and the configuration in the vCOPS and the vCO. Jörg has this already documented here: http://www.vcoportal.de/2012/05/integrate-vcops-and-vco/ so I will not repeat this steps here.

Let’s start with the Workflow development. First, we have to create a new workflow. I will call it “Execute Host Maintenance after trap” and I insert the following description “Waits to receive an SNMP trap from a vCenter Server instance, then set the host in maintenance mode or exist the maintenance mode “

After we created the Workflow, we go to the “Schema” tab

Here we insert some Elements. We need:

  • One Scriptable Task
  • One Decsion
  • The Workflow Enter Maintenance Mode
  • The Workflow Exit Maintenance Mode
  • And a Workflow End.

When you are ready your order should look like mine.

Now, let’s start with the Scriptable Task. I name it “Retrieve Host”. We want to extract the information out of the SNMP Message, for that we need some variables and SNMP Trap information. Let’s start with the SNMP Trap Information. The SNMP Messages comes with OID Numbers. Each OID Number has a different meaning and stands for a other component or message. The values of this OID Number can be found in the SNMP MIB Files which can be downloaded from the VMware Website. Every trap data has more than one OID Number

=============

oid: 1.3.6.1.2.1.1.3.0

type: Number

snmp type: Timeticks

value: 1743301911

Element 2:

=============

oid: 1.3.6.1.6.3.1.1.4.1.0

type: String

snmp type: OID

value: 1.3.6.1.4.1.19004.0.25

Element 3:

=============

oid: 1.3.6.1.4.1.19004.2.1

type: String

snmp type: Octet String

value: localhost

Element 4:

=============

oid: 1.3.6.1.4.1.19004.2.2

type: String

snmp type: Octet String

value: 10.10.120.183

Element 5:

=============

oid: 1.3.6.1.4.1.19004.2.3

type: String

snmp type: Octet String

value: Resource

Element 6:

=============

oid: 1.3.6.1.4.1.19004.2.4

type: String

snmp type: Octet String

value: 1346068003943

Element 7:

=============

oid: 1.3.6.1.4.1.19004.2.5

type: String

snmp type: Octet String

value: Critical

Element 8:

=============

oid: 1.3.6.1.4.1.19004.2.6

type: String

snmp type: Octet String

value: New alert by id 36 is generated at Mon Aug 27 11:46:43 UTC 2012; Root Cause : (2 SYMPTOMS)

1. MESSAGE EVENT    (1 OF 3)

33% - FAULT -

33% - CHANGE EVENT -

Element 9:

=============

oid: 1.3.6.1.4.1.19004.2.7

type: String

snmp type: Octet String

value: https://10.10.120.166/vcops-vsphere/?alert=36

Element 10:

=============

oid: 1.3.6.1.4.1.19004.2.8

type: Number

snmp type: Integer

value: 36

Element 11:

=============

oid: 1.3.6.1.4.1.19004.2.9

type: String

snmp type: Octet String

value: Verbindungsstatus der physischen Netzwerkkarte vmnic1 ist nicht bereit.

Element 12:

=============

oid: 1.3.6.1.4.1.19004.2.10

type: String

snmp type: Octet String

value: Health

Element 13:

=============

oid: 1.3.6.1.4.1.19004.2.11

type: String

snmp type: Octet String

value: Faults

For us, we want to search for the OID Number of the hostname and the error Message.

With this background information, we can create our needed In- and Outputs for our Workflow.

Local Parameter Variable Name Module Direction Type Value
trapData trapData Retrieve Host in Array/Properties
HostOID HostOID Retrieve Host in String 1.3.6.1.4.1.19004.2.2
MessageOID MessageOID Retrieve Host in String 1.3.6.1.4.1.19004.2.6
Host Host Retrieve Host out VC:HostSystem
SNMPMessage SNMPMessage Retrieve Host out String
NewAlert NewAlert Retrieve Host out Boolean

One important thing here: The trapData must be defined as Input Parameter not as attribute!

After we have created our variables, we can start with our scripting. I made a lot comments in the script, so everybody should understand what there happens….


// We extract the host name out of the SNMP TrapData
var HostName;
for (var x = 0; x < trapData.length; x++) {
var prop = trapData[x];
if (prop.get("oid") == HostOID) {
HostName = prop.get("value");
break;
}
}

// We compare the extracted Hostname with the Hosts registred in the vCenter Server.
// when we hava a match, we have the managed object ID of the Host
var hosts = VcPlugin.getAllHostSystems();
for (var i = 0; i < trapData.length; i++) {
var tempHost = hosts[i];
if (tempHost.name == HostName) {
Host = tempHost;
break;
}
}

// We search for the OID in the Message to get the Field with the recieved error Message
var SNMPTrap;
for (var y = 0; y < trapData.length; y++) {
var prop = trapData[y];
if (prop.get("oid") == MessageOID) {
SNMPTrap = prop.get("value");
break;
}
}

// Our search values in the Value Field of the Message
// When the Field contains the search string we beome the position. Otherwise be become a -1 value back.
var SNMPAlert = SNMPTrap.indexOf("New alert");
var SNMPCancel = SNMPTrap.indexOf("is cancelled");

// We check if the SNMP Trap has "New Alert" in the value field. Is so, we have a new alert.
if ( SNMPAlert != -1) {
NewAlert = true}
else {
NewAlert = false
};

Then we go to the Decsion. I name it “New Alert”. As Input we only need “NewAlert”. In the Decision field itself we set “NewAlert is true”.

After that, we go to the Enter Maintenance Mode. Here we only need to input variables:

Local Parameter Variable Name Module Direction Type Value
Host Host Enter Maintenance in VC:HostSystem
timeout timeout Enter Maintenance in number 0

Our Visual Binding has to look like this:

The same Variables and the same Visual Binding is required for Exit Maintenance Mode Workflow.

Local Parameter Variable Name Module Direction Type Value
Host Host Exit Maintenance in VC:HostSystem
timeout timeout Exit Maintenance in number 0

At last, we have to connect our Elements.  For the connections we Connect the start to “Retrieve Host” from there we connect the “New Alert”. From the “New Alert” we go with the Green line to “Enter Maintenance Mode” and with the red line to “Exit Maintenance Mode”.

Both Elements were connected to the End.

At last, we validate our Workflow and save our work.

After we have finished our Workflows, we create a Policy Template. For that, go to “policy Template” and create a folder with a name of your choice. I have a folder with the Name “vcoportal.de”.

With a “right click” on the folder you can open a context menu and choose “ Add policy template..:”

First we have to insert a name and a description. I choose “SNMP Trap with data” as name.

Then we go to the Scripting tab

There we have to insert the Device from with we want to catch our data. You can insert the device by clicking on the first button.

In the Dialog we choose SNMP:SnmpDevice

Then we check the SNMP Device and then click on the Second button.

As trigger we choose “onTrap”

At last, we have to insert this Scripting in “Script” field from the “OnTrap” Trigger.


// Author Christian Strijbos (cs@vcoportal.de)
// based on Examples and a Blog Post of  William Lam
// SNMP Policy to Check for Host Maintenance in case of an host error

// Execute Host Maintenace after trap ID
// To catch the Workflow ID, just go to the Workflow, type CTRL-C, open Notepad and Type CTRL-V
// You will get the ID of the Workflow in the id. Field.
var wfId = "83808080808080808080808080808080AA8A808001345464207298aebf2a6a5a5";

// process SNMP trap
var key = event.getValue("key");
var snmpResult = SnmpService.retrievePolicyData(key);
var trapData = System.getModule("com.vmware.library.snmp").processSnmpResult(snmpResult);
runWF(wfId,trapData);

// function to launch WF
function runWF(wfId,trapData) {
var workflowToLaunch = Server.getWorkflowWithId(wfId);
if (workflowToLaunch == null) {
throw "Workflow not found";
}
var workflowParameters = new Properties();
workflowParameters.put("trapData",trapData);
System.log("Launching Execute Host Maintenace after trap WF: " + wfId);
var wfToken = workflowToLaunch.execute(workflowParameters);
}

Some special notes here: In the Policy tab, variables are not highlighted. So if you write scripts on your own, make sure you write your variables well. Also there is no validation available, so there are a lot of possibility’s to make things go wrong 🙁

Save and close the template.

Next we want to apply out policy. Make a “right click” on the Policy Template and choose “Apply Policy..” in the Context menü.

In the menu you have to choose our SNMP Device

You can find your Policy under the “policy” Tab.

Right Click on it and choose “Edit”.

There you can choose our Startup Policy.  I choose “on Server Startup, start the Policy”. Save and  close the Policy.

At last, “right click” on the policy and start it.

After that, your policy is active and wait’s for SNMP traps from the vCOPS.

My setup follows the same principle like the integration with the SNMP Plugin with the vCenter Server like described in this Link:

http://blogs.vmware.com/orchestrator/2011/09/snmp-plug-in-integration-with-vcenter.html

vCO_Package_SNMP_vCOPS
vCO_Package_SNMP_vCOPS
de.vcoportal.snmp.vcops.package
35.8 KiB
Details...