Understanding Monitoring Collectors
Intended for use with Cassatt Active Response V5.0.
One of key features in Cassatt Active Response is automatic detection and replacement when application nodes fail, and when applications stop providing services. Cassatt Active Response software plug-ins that trigger automatic node replacement are called monitoring collectors.
This articles describes how monitoring collectors work, and how to select and configure both application and OS monitoring collectors.
How monitoring collectors are used in node replacement
Cassatt Active Response automatically replaces application nodes based on node status: a critical status triggers a node replacement. The Controller detects and displays about 20 different node statuses. Most node statuses are based on user actions, such as shutting down or disabling a node, and represent a static state that you have specified for the node. Several are interim statuses that apply while the node is in transition—such as during inventory—and will eventually resolve to another status.
The most interesting node statuses—normal and critical—are derived from information provided by Cassatt Active Response monitoring collectors. Monitoring collectors are Cassatt Active Response software plug-ins that run on the control node and connect to application nodes to determine whether the nodes and their applications are running.
Monitoring collectors do not provide critical and normal statuses directly; instead, critical and normal statuses represent a roll up of results from one or more monitoring collectors. The collectors attempt to contact application nodes and retrieve information. If contact is successful, Cassatt Active Response assigns a normal status. If any collector fails to connect, Cassatt Active Response assigns a critical status and—most importantly—automatically replaces the node if a suitable replacement is available.
top
Cassatt Active Response provides a variety of monitoring collectors that use industry-standard protocols:
- SNMP
- ICMP (ping)
- HTTP
- JMX
- SSH-executed scripts
At a minimum, Cassatt Active Response requires that you configure at least one collector for an image. For example, you can use the ICMP collector, which pings the operating system on the application node. A failed ping constitutes a failure, which sets the node to critical and triggers replacement. This minimum collector requirement forms the baseline for node critical or normal status.
Each additional collector you specify provides Cassatt Active Response another view into whether a node is providing service, and another data point to use in detecting failure. The more ways you specify to monitor node activity, the more opportunities you give Cassatt Active Response to detect service level breaches, and the more likely it is that Cassatt Active Response will replace a problem application node—ensuring maximum application availability.
Exporting monitoring data for use with other system monitors
Cassatt Active Response normalizes and stores all of the monitoring data it collects in an internal database. Cassatt Active Response then exports the data for use with an SNMP- or JMX-compliant system monitor, such as HP’s OpenView Performance monitor or IBM’s Tivoli Monitoring.
The same rules apply to viewing data in an external tool as to detecting node failure in Cassatt Active Response: Cassatt Active Response collects data only when you explicitly configure collectors for an image.
For information on connecting a system monitor to Cassatt Active Response, see Accessing Cassatt Active Response Data for Use By Third-Party Monitoring Software.
top
You select and configure monitoring collectors separately for the operating system and for each application within a software image. Then you assign each software image—with its associated collectors—to a tier.
To decide which collector, or collectors, to use for each application in an image, you need to understand the applications themselves. Many applications are built to be monitored in a specific way. For example, several operating systems and many applications provide an SNMP interface. For those operating systems and applications, you might want to use the SNMP collector.
Other applications, such as WebLogic and most other J2EE applications, contain a JMX implementation, so you might select the JMX collector to monitor those applications.
Application monitoring is most effective when you take full advantage of the monitoring services built into your applications, as shown in the next illustration.

To use a monitoring service that is built into an application, you have to set up the service’s operation as part of installing and configuring the software, in addition to configuring the associated collector in Cassatt Active Response.
top
You can configure collectors during image capture (for new images) or in the Controller (for existing images and duplicated images). Let's look at capturing images first.
When you specify values during image capture (using the cccapture command), they are stored in a temporary, internal file that contains the initial parameters for an image, including the monitoring collector specifications.
The cccapture command presents a series of questions about the application. Here is an excerpt from the cccapture interview relating to configuring OS collectors:
OS Monitoring Options
Specify the operating system monitoring that will be used for this image (at least one monitoring option must be configured):
Monitor via SNMP? [n] SNMP Port: [161] Read Community: [public] SNMP Timeout (seconds): [30] SNMP Collection Interval (seconds): [60] SNMP Retry Count: [3]
Monitor via ping? [n] Ping Timeout (seconds): [30] Ping Collection Interval in seconds (1-300): [30] Ping Retry Count: [3]
Monitor via script? [n] Script path/name: [] Script timeout (seconds): [60] Script Collection Interval (seconds): [60] Script Retry Count: [3]
The cccapture command has similar questions for monitoring applications.
You can change the monitors for an existing image in the Controller by going to:
- Images > image name > Properties (for OS monitoring)
- Images > image name > Applications (for application monitoring)
Whenever you add monitors, remove monitors, or change monitor properties for an image, the changes take effect immediately in any tiers that are running the image.

Supported collectors
Each collector has a set of parameters that control how the collector connects to and, in some cases, retrieves information from a node.
Collectors are pluggable, meaning that new collectors can easily be added to a Cassatt Active Response environment—watch this space for new additions. Is your application or hardware proprietary? Contact support@cassatt.com for assistance with your own custom collector plug-in.
ICMP (ping)
Ping can be used only to monitor an image’s operating system, not an application.
If you will be configuring Cassatt Active Response (during cccapture) to boot images from local disk (rather than the default NFS), you cannot use ping for monitoring an application. Why? Ping can show false results; for example, it can report a node being online too early (while it is still booting from local disk), or as being down when it's not (when the node transitions from booting locally to coming online).
Property |
Description |
Valid Values |
Default Value |
timeout |
Time between retries after a connection failure |
> 0 |
30 seconds |
interval |
Interval between connections |
> 0 |
30 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
HTTP
The HTTP collector is most commonly used to monitor web applications.
Property |
Description |
Valid Values |
Default Value |
secure |
If true then HTTPS is used, otherwise HTTP is used |
true, false |
false |
port |
Port |
valid HTTP/HTTPS port |
80 |
path |
Path of the page to monitor |
string |
/ |
timeout |
Time between retries after a connection failure |
> 0 |
30 seconds |
interval |
Interval between connections |
> 0 |
60 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
For best performance, select a small page that responds quickly. Monitoring a large page increases CPU utilization.
top
SNMP
You can use SNMP to monitor any application that supports the protocol, including an operating system.
Property |
Description |
Valid Values |
Default Value |
version |
SNMP version |
v1, v2c |
v2c |
port |
Port of SNMP agent |
port number |
161 |
readCommunity |
SNMP v1 and v2c read-only community |
- |
public |
timeout |
Time between retries after a connection failure |
> 0 |
30 seconds |
interval |
Interval between connections |
> 0 |
60 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
top
SSH-executed scripts
The script collector can be used to monitor either the operating system or application services. Scripts can be long running or can run and exit.
For long-running scripts, the node or service is considered to be up as long as the script is running.
For scripts that run and exit, Cassatt Active Response uses the exit value to determine whether the node or service is up and running. An exit value of 0 indicates successful execution of the script. Scripts that exit are called periodically to ensure that the node or service remains up and running.
Cassatt does not provide scripts, as they must be specific to your software. You must provide the script and install it along with your applications when you create the image.
Property |
Description |
Valid Values |
Default Value |
script |
Path to script to run |
string |
- |
timeout |
Timeout for script |
> 0 |
60 seconds |
interval |
Interval between connections |
> 0 |
60 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
top
JMX (WebLogic 8.1 only)
If you are running WebLogic 8.1 in your Cassatt Active Response environment, you can monitor WebLogic services using the JMX collector via WebLogic's JMX interface. Cassatt Active Response currently does not support JMX monitoring for any other applications. For more information on WebLogic’s JMX implementation, go to http://e-docs.bea.com/wls/docs81/jmx/index.html.
Property |
Description |
Valid Values |
Default Value |
userName |
Name of WebLogic admin user |
string |
weblogic |
password |
Password of WebLogic admin user |
string |
weblogic |
listenPort |
ListenPort of the WebLogic server |
positive integer |
7001 |
classpath |
Path of weblogic.jar within image |
string |
/opt/bea/weblogic81/
server/lib/weblogic.jar |
timeout |
Time between retries after a connection failure |
> 0 |
30 seconds |
interval |
Interval between connections |
> 0 |
60 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
Multiple WebLogic servers can run on a node, each of which is monitored separately. It is assumed that every node in a tier is running servers on the same port, with the exception of the admin server. To monitor WebLogic on more than one port in order to accommodate the admin server’s distinct port, configure the JMX collector for WebLogic twice when capturing the application image with the cccapture command. To see a sample configuration for JMX, see Sample configuration: JMX collector for WebLogic.
top
JMX (WebLogic 9.2 only)
If you are running WebLogic 9.2 in your Cassatt Active Response environment, you can monitor WebLogic services using the JMX collector via the JSR 160 standard API. Cassatt Active Response currently does not support JMX monitoring for any other applications.
Property |
Description |
Valid Values |
Default Value |
userName |
Name of WebLogic admin user |
string |
weblogic |
password |
Password of WebLogic admin user |
string |
weblogic |
listenPort |
ListenPort of the WebLogic server |
positive integer |
7001 |
timeout |
Time between retries after a connection failure |
> 0 |
30 seconds |
interval |
Interval between connections |
> 0 |
60 seconds |
retries |
Number of attempts to reconnect after a connection failure |
> 0 |
3 attempts |
Multiple WebLogic servers can run on a node, each of which is monitored separately. It is assumed that every node in a tier is running servers on the same port, with the exception of the admin server. To monitor WebLogic on more than one port in order to accommodate the admin server’s distinct port, configure the JMX collector for WebLogic twice when capturing the application image with the cccapture command. To see a sample configuration for JMX, see Sample configuration: JMX collector for WebLogic.
top
Personalization and monitoring: caution
Some tier node applications must be personalized before they are available for monitoring (for example, a clustered application such as Oracle 9i Real Application Clusters (Oracle RAC) and WebLogic). If a tier node application requires personalization before monitoring, follow these steps:
- Bypass configuring monitoring during cccapture interview.
- On tier nodes, personalize image instances.
- Configure monitoring in the Controller (Images > image link > Applications).
Verifying monitoring collectors
During image capture, you can check that configured monitors are working. If they are not, you should fix them before completing image capture because it more time-consuming after tiers are created.
top
Activating and managing collectors
As soon as an application node comes online, all configured collectors are active.
You can add or delete collectors for an image at any time; changes to image monitors take effect immediately on any tiers that run the image.
Before you add a collector to an image, make sure the operating system or application is configured for the collector. If you did not configure the software, first create a new version of the image, configure the software, and update the tier. Finally, when you know the image is properly configured to work with the collector, add it to the image from the Controller image properties page.
top
Was this article useful? Tell us what you think.
Email infocentral@cassatt.com.
|