Overview
With RightLink 10.1.2 and newer, monitoring and alerts are handled by RightScale Time Series Storage, or TSS. TSS is the name for the back-end system for aggregating, displaying, and acting on monitoring data via alerts. TSS is built to work with collectd and HTTPS. This document describes the steps for setting up monitoring and alerts with RightLink 10.
Requirements
- RightLink 10.1.2 or newer
- TSS enabled for your account. Contact RightScale Support if not enabled.
- Collectd 4.8 or newer (Linux)
TSS vs Sketchy
RightLink versions prior to 10.1.2 used a different back-end system (Sketchy servers) with a number of limitations. The Sketchy system only worked with Collectd 4 and only accepted UDP traffic. The Sketchy system had some shortcomings, such as UDP traffic not working well with proxies, firewalls, and not being easy to send data securely. For TSS, UDP is supported for backwards compatibility, but HTTPS is strongly preferred. The RightLink process runs a proxy for collectd data on the instance. Collectd data is first sent to the RightLink process on the instance over HTTP. RightLink then adds authentication headers and forwards the monitoring data onto the TSS back-end over HTTPS.
Linux Setup Procedure
On Linux, monitoring is built on top of collectd, which sends data to the local RightLink process. As a reference, see the RL10 Linux Enable Monitoring script which ships as part of RightLink Linux 10.X.X Base
ServerTemplate (ST). The following steps are performed by this ServerTemplate:
- Install collectd. Collectd 4 and 5 are both supported, though there are differences between them (see caveats). For RedHat based systems (AWS Linux, CentOS, Fedora, Oracle Linux, etc.) the collectd packages may be found in the Fedora Extra Packages for Enterprise Linux (EPEL) repository.
- Configure any collectd plugins needed. The Base ST configures syslog, interface, cpu, df, disk, memory, load, processes, users and swap plugins by default.
- Configure the
write_http
plugin to post data to the RightLink process. The RightLink process has a HTTP server on a random high port recorded as RS_RLL_PORT in /var/run/rightlink/secret. Sample contents of plugin config, assuming collectd 5 is running. If 4 is running, change the collectdv5 to collectdv4:
LoadPlugin "write_http"
<Plugin "write_http">
URL "http://127.0.0.1:54312/rll/tss/collectdv5"
</Plugin>
- Configure collectd. The
Hostname
value in the collectd config must be set to ENV:RS_INSTANCE_UUID variable passed in through the ServerTemplate config, formatted like this: 01-3IPDVL6CR0FSK. FQDNLookup must be false as this is a UUID type value and not a FQDN. This value may also be found on the Info tab of your Server. Use the default 20 second interval. - Add
rs_monitoring:state=auth
tag to the Server. Note that this tag has changed slightly. For pre-TSS accounts, this tag was rs_monitoring:state=active.
Alternative Linux Monitoring Setup
Some Linux OSs do not have a standard install for collectd, such as CoreOS. By default, running the RL10 Linux Enable Monitoring script on CoreOS will install native RightLink monitoring. With this method of monitoring, RightLink will obtain the same metrics from the operating system as the default install of collectd5 on Ubuntu 14.04 and pass it to TSS. A caveat of using this method of monitoring is the inability to use collectd plugins except those supported by Custom Monitoring Plugins with Built In Monitoring.
Windows Setup Procedure
On Windows, monitoring is built into RightLink. It currently monitors CPU usage, memory usage, disk usage, and network traffic. The monitoring metric names are similar to those of Collectd 5 and has some plugin support with Custom Monitoring Plugins. As a reference, see the RL10 Windows Enable Monitoring script which ships as part of the RightLink Linux 10.X.X Base
ServerTemplate (ST). The following step is performed by this ServerTemplate:
- Enable Monitoring. This can be done with the RightScale API client (rsc), a Go-based API client that ships with RightLink:
rsc rl10 update /rll/tss/control enable_monitoring=all
. This request tells RightLink to start sending monitoring data to TSS and also adds thers_monitoring:state=auth
tag to the server.
Viewing/Accessing Data
Monitoring graphs should now show up on the Monitoring
tab of your Server in the Cloud Management dashboard. Alerts may be set up on the Alerts
tab of your Server. Metrics, alerts and alert actions may be viewed or set via API 1.5: MonitoringMetric, Alerts, AlertSpecs.
Custom Monitoring Plugins with Built In Monitoring
RightLink's built in monitoring has support for running custom monitoring plugins that are compatible with collectd's Exec plugin. For Linux these can be scripts in any installed scripting language and on Windows they can be PowerShell scripts. Just like collectd, RightLink passes in the COLLECTD_HOSTNAME
(the RightScale instance UUID) and COLLECTD_INTERVAL
(the monitoring interval in seconds; the defaults is 20) environment variables and expects the scripts to run continuously, but will restart them if they exit at the next monitoring interval.
Here is an example of a Bash script which would be used on Linux:
#!/bin/bash
while true; do
NOW=`date +%s`
VALUE=do_magic()
printf "PUTVAL %s/exec-magic/gauge-magic_level interval=%d %d:%g\n" $COLLECTD_HOSTNAME \
$COLLECTD_INTERVAL $NOW $VALUE
sleep $COLLECTD_INTERVAL
done
And, here is an example of a PowerShell script which would be used on Windows:
while ($true) {
$now = [int](Get-Date -UFormat %s)
$value = Do-Magic()
Write-Output ("PUTVAL {0}/exec-magic/gauge-magic_level interval={1:d} {2:d}:{3:g}" -f \
$env:COLLECTD_HOSTNAME, $env:COLLECTD_INTERVAL, $now, $value)
Sleep $env:COLLECTD_INTERVAL
}
These custom monitoring scripts can be configured to be run by RightLink through its HTTP interface. The HTTP interface has actions for adding/updating, listing, and removing scripts from the configuration. RightLink stores this configuration so the scripts will be started again if the RightLink service is restarted, but a boot RightScript on a ServerTemplate should be used to ensure it is configured on instance reboot or relaunch. The RSC utility which comes with RightLink is the recommended tool for working with the HTTP interface.
Here is an example RightScript that would install our example monitoring script from an attachment and configure it to run under RightLink on Linux:
#!/bin/bash
magic_dir='/opt/magic_monitoring'
magic_script='magic.sh'
magic_path="$magic_dir/$magic_script"
sudo install -D "$RS_ATTACH_DIR/$magic_script" $magic_path
rsc rl10 create /rll/tss/exec/magic executable=$magic_path
And, here is an example RightScript that would install our example monitoring script from an attachment and configure it to run under RightLink on Windows:
$magicDir = 'C:\Program Files\Magic Monitoring'
$magicScript = 'magic.ps1'
$magicPath = "$magicDir\$magicScript"
New-Item $magicDir -ItemType Directory
Copy-Item "${env:RS_ATTACH_DIR}\$magicScript" $magicPath
rsc rl10 create /rll/tss/exec/magic executable=$magicPath
If the do_magic
or Do-Magic
functions existed, running one of these RightScripts on an instance with RightLink 10 monitoring enabled would start showing a gauge-magic_level
graph under exec-magic
on the Monitoring tab for the server in the RightScale dashboard.
Some example RightScripts with accompanying monitoring scripts for Windows are available in the rightscale/rightlink_scripts repository:
- SYS IIS monitoring install installs and configures iis-monitor.ps1 which monitors Microsoft ISS.
- DB SQLS Install monitors installs and configures mssql-monitor.ps1 which monitors Microsoft SQL Server.
Troubleshooting
If collectd graphs do not appear to be populating, try the following steps:
- Check the RightLink main process audit log (RightLink 10.X.X log pid NNNN). This should note if data was posted.
- Run
collectd -T
to check the collectd process configuration. It should return an error or an empty string/status success. - Check for the
rs_monitoring:state=auth
tag. - Check the collectd config -- the write_http plugin configuration should post to the port specified in /var/run/rightlink/secret.
Caveats
TSS supports both collectd 4 and 5. There are differences between the versions. See Collectd v4 to v5 migration guide for differences in plugin config and data formats. Some of the metric names have changed and will display differently in the dash and will be be different if selected for Alerts or polled via the API. For example 'interface/if_octets-eth0' was changed to 'interface-eth0/if_octets'.