SEARCH DOCS
info central: your site for Collage technical info
  CASSATT.COM   INFO CENTRAL
ACTIVE RESPONSE 5.0 TOPICS BLUEPRINTS TROUBLESHOOTING DOC INDEX


 

TOC

How monitoring collectors are used in node replacement
Cassatt Active Response collectors and requirements
Exporting monitoring data to other system monitors
Selecting monitor collectors
Configuring monitoring collectors
Supported collectors
Personalization and monitoring: caution
Verifying monitor collectors
Activating and managing collectors
   
 

Sidebars

Sample configuration: JMX collector for WebLogic
   
know-how:

Understanding Monitoring Collectors

Intended for use with Cassatt Active Response V5.0.

One of key features in Cassatt Active Response is automatic detection and replacement when application nodes fail, and when applications stop providing services. Cassatt Active Response software plug-ins that trigger automatic node replacement are called monitoring collectors.

This articles describes how monitoring collectors work, and how to select and configure both application and OS monitoring collectors.

How monitoring collectors are used in node replacement

Cassatt Active Response automatically replaces application nodes based on node status: a critical status triggers a node replacement. The Controller detects and displays about 20 different node statuses. Most node statuses are based on user actions, such as shutting down or disabling a node, and represent a static state that you have specified for the node. Several are interim statuses that apply while the node is in transition—such as during inventory—and will eventually resolve to another status.

The most interesting node statuses—normal and critical—are derived from information provided by Cassatt Active Response monitoring collectors. Monitoring collectors are Cassatt Active Response software plug-ins that run on the control node and connect to application nodes to determine whether the nodes and their applications are running.

Monitoring collectors do not provide critical and normal statuses directly; instead, critical and normal statuses represent a roll up of results from one or more monitoring collectors. The collectors attempt to contact application nodes and retrieve information. If contact is successful, Cassatt Active Response assigns a normal status. If any collector fails to connect, Cassatt Active Response assigns a critical status and—most importantly—automatically replaces the node if a suitable replacement is available.

top

Cassatt Active Response collectors and requirements

Cassatt Active Response provides a variety of monitoring collectors that use industry-standard protocols:

  • SNMP
  • ICMP (ping)
  • HTTP
  • JMX
  • SSH-executed scripts

At a minimum, Cassatt Active Response requires that you configure at least one collector for an image. For example, you can use the ICMP collector, which pings the operating system on the application node. A failed ping constitutes a failure, which sets the node to critical and triggers replacement. This minimum collector requirement forms the baseline for node critical or normal status.

Each additional collector you specify provides Cassatt Active Response another view into whether a node is providing service, and another data point to use in detecting failure. The more ways you specify to monitor node activity, the more opportunities you give Cassatt Active Response to detect service level breaches, and the more likely it is that Cassatt Active Response will replace a problem application node—ensuring maximum application availability.

Exporting monitoring data for use with other system monitors

Cassatt Active Response normalizes and stores all of the monitoring data it collects in an internal database. Cassatt Active Response then exports the data for use with an SNMP- or JMX-compliant system monitor, such as HP’s OpenView Performance monitor or IBM’s Tivoli Monitoring.

The same rules apply to viewing data in an external tool as to detecting node failure in Cassatt Active Response: Cassatt Active Response collects data only when you explicitly configure collectors for an image.

For information on connecting a system monitor to Cassatt Active Response, see Accessing Cassatt Active Response Data for Use By Third-Party Monitoring Software.

top

Selecting monitoring collectors

You select and configure monitoring collectors separately for the operating system and for each application within a software image. Then you assign each software image—with its associated collectors—to a tier.

To decide which collector, or collectors, to use for each application in an image, you need to understand the applications themselves. Many applications are built to be monitored in a specific way. For example, several operating systems and many applications provide an SNMP interface. For those operating systems and applications, you might want to use the SNMP collector.

Other applications, such as WebLogic and most other J2EE applications, contain a JMX implementation, so you might select the JMX collector to monitor those applications.

Application monitoring is most effective when you take full advantage of the monitoring services built into your applications, as shown in the next illustration.

To use a monitoring service that is built into an application, you have to set up the service’s operation as part of installing and configuring the software, in addition to configuring the associated collector in Cassatt Active Response.

top

Configuring OS and application monitoring collectors in Cassatt Active Response

You can configure collectors during image capture (for new images) or in the Controller (for existing images and duplicated images). Let's look at capturing images first.

When you specify values during image capture (using the cccapture command), they are stored in a temporary, internal file that contains the initial parameters for an image, including the monitoring collector specifications.

The cccapture command presents a series of questions about the application. Here is an excerpt from the cccapture interview relating to configuring OS collectors:

OS Monitoring Options
Specify the operating system monitoring that will be used
for this image (at least one monitoring option must be
configured):
Monitor via SNMP? [n]
SNMP Port: [161]
Read Community: [public]
SNMP Timeout (seconds): [30]
SNMP Collection Interval (seconds): [60]
SNMP Retry Count: [3]
Monitor via ping? [n]
Ping Timeout (seconds): [30]
Ping Collection Interval in seconds (1-300): [30]
Ping Retry Count: [3]
Monitor via script? [n]
Script path/name: []
Script timeout (seconds): [60]
Script Collection Interval (seconds): [60]
Script Retry Count: [3]

The cccapture command has similar questions for monitoring applications.

You can change the monitors for an existing image in the Controller by going to:

  • Images > image name > Properties (for OS monitoring)
  • Images > image name > Applications (for application monitoring)

Whenever you add monitors, remove monitors, or change monitor properties for an image, the changes take effect immediately in any tiers that are running the image.

Supported collectors

Each collector has a set of parameters that control how the collector connects to and, in some cases, retrieves information from a node.

Collectors are pluggable, meaning that new collectors can easily be added to a Cassatt Active Response environment—watch this space for new additions. Is your application or hardware proprietary? Contact support@cassatt.com for assistance with your own custom collector plug-in.

ICMP (ping)

Ping can be used only to monitor an image’s operating system, not an application.

If you will be configuring Cassatt Active Response (during cccapture) to boot images from local disk (rather than the default NFS), you cannot use ping for monitoring an application. Why? Ping can show false results; for example, it can report a node being online too early (while it is still booting from local disk), or as being down when it's not (when the node transitions from booting locally to coming online).

Property

Description

Valid Values

Default Value

timeout

Time between retries after a connection failure

> 0

30 seconds

interval

Interval between connections

> 0

30 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

HTTP

The HTTP collector is most commonly used to monitor web applications.

Property

Description

Valid Values

Default Value

secure

If true then HTTPS is used, otherwise HTTP is used

true, false

false

port

Port

valid HTTP/HTTPS port

80

path

Path of the page to monitor

string

/

timeout

Time between retries after a connection failure

> 0

30 seconds

interval

Interval between connections

> 0

60 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

For best performance, select a small page that responds quickly. Monitoring a large page increases CPU utilization.

top

SNMP

You can use SNMP to monitor any application that supports the protocol, including an operating system.

Property

Description

Valid Values

Default Value

version

SNMP version

v1, v2c

v2c

port

Port of SNMP agent

port number

161

readCommunity

SNMP v1 and v2c read-only community

-

public

timeout

Time between retries after a connection failure

> 0

30 seconds

interval

Interval between connections

> 0

60 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

top

SSH-executed scripts

The script collector can be used to monitor either the operating system or application services. Scripts can be long running or can run and exit.

For long-running scripts, the node or service is considered to be up as long as the script is running.

For scripts that run and exit, Cassatt Active Response uses the exit value to determine whether the node or service is up and running. An exit value of 0 indicates successful execution of the script. Scripts that exit are called periodically to ensure that the node or service remains up and running.

Cassatt does not provide scripts, as they must be specific to your software. You must provide the script and install it along with your applications when you create the image.

Property

Description

Valid Values

Default Value

script

Path to script to run

string

-

timeout

Timeout for script

> 0

60 seconds

interval

Interval between connections

> 0

60 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

top

JMX (WebLogic 8.1 only)

If you are running WebLogic 8.1 in your Cassatt Active Response environment, you can monitor WebLogic services using the JMX collector via WebLogic's JMX interface. Cassatt Active Response currently does not support JMX monitoring for any other applications. For more information on WebLogic’s JMX implementation, go to http://e-docs.bea.com/wls/docs81/jmx/index.html.

Property

Description

Valid Values

Default Value

userName

Name of WebLogic admin user

string

weblogic

password

Password of WebLogic admin user

string

weblogic

listenPort

ListenPort of the WebLogic server

positive integer

7001

classpath

Path of weblogic.jar within image

string

/opt/bea/weblogic81/
server/lib/weblogic.jar

timeout

Time between retries after a connection failure

> 0

30 seconds

interval

Interval between connections

> 0

60 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

Multiple WebLogic servers can run on a node, each of which is monitored separately. It is assumed that every node in a tier is running servers on the same port, with the exception of the admin server. To monitor WebLogic on more than one port in order to accommodate the admin server’s distinct port, configure the JMX collector for WebLogic twice when capturing the application image with the cccapture command. To see a sample configuration for JMX, see Sample configuration: JMX collector for WebLogic.

top

JMX (WebLogic 9.2 only)

If you are running WebLogic 9.2 in your Cassatt Active Response environment, you can monitor WebLogic services using the JMX collector via the JSR 160 standard API. Cassatt Active Response currently does not support JMX monitoring for any other applications.

Property

Description

Valid Values

Default Value

userName

Name of WebLogic admin user

string

weblogic

password

Password of WebLogic admin user

string

weblogic

listenPort

ListenPort of the WebLogic server

positive integer

7001

timeout

Time between retries after a connection failure

> 0

30 seconds

interval

Interval between connections

> 0

60 seconds

retries

Number of attempts to reconnect after a connection failure

> 0

3 attempts

Multiple WebLogic servers can run on a node, each of which is monitored separately. It is assumed that every node in a tier is running servers on the same port, with the exception of the admin server. To monitor WebLogic on more than one port in order to accommodate the admin server’s distinct port, configure the JMX collector for WebLogic twice when capturing the application image with the cccapture command. To see a sample configuration for JMX, see Sample configuration: JMX collector for WebLogic.

top

Personalization and monitoring: caution

Some tier node applications must be personalized before they are available for monitoring (for example, a clustered application such as Oracle 9i Real Application Clusters (Oracle RAC) and WebLogic). If a tier node application requires personalization before monitoring, follow these steps:

  1. Bypass configuring monitoring during cccapture interview.
  2. On tier nodes, personalize image instances.
  3. Configure monitoring in the Controller (Images > image link > Applications).


Verifying monitoring collectors

During image capture, you can check that configured monitors are working. If they are not, you should fix them before completing image capture because it more time-consuming after tiers are created.

top

Activating and managing collectors

As soon as an application node comes online, all configured collectors are active.

You can add or delete collectors for an image at any time; changes to image monitors take effect immediately on any tiers that run the image.

Before you add a collector to an image, make sure the operating system or application is configured for the collector. If you did not configure the software, first create a new version of the image, configure the software, and update the tier. Finally, when you know the image is properly configured to work with the collector, add it to the image from the Controller image properties page.

top