RightScale has long used a monitoring system based on a combination of open source and proprietary components. The monitoring system overall consists of three major parts: the monitoring agent running on customer servers to send data to the RightScale monitoring system, the back-end storage system operated by RightScale to store the monitoring data, and the UI to query and visualize the monitoring data and generate alerts. The legacy monitoring system uses collectd as agent, RDDtool plus custom software for storage, and a custom UI generated by Rails with graphs drawn by RRDtool.
Recently, a new monitoring system has replaced the back-end storage component (rrdcached) with TSS (Time Series Storage), keeping the monitoring agent and the UI the same. The following table outlines the configurations of the legacy and TSS-based monitoring systems.
|Component||Legacy Monitoring System (sketchy)||New Monitoring System (TSS)|
|monitoring agent||collectd v4||collectd v4 & v5|
|back-end storage system||RRDtool (rrdcached)||TSS|
|User Interface||RRDtool (rrd graph) + Rails||<unchanged>|
The monitoring features are only available for our pay editions. If you have a Developer account, you need to upgrade in order to unlock this feature. Please contact us at email@example.com.
How Does the Monitoring System Work?
The overall process of RightScale's monitoring system is as follows:
- The Collectd system statistics daemon is installed on an instance at boot time using the SYS Monitoring install RightScript.
- The website passes the hostname of the monitoring server to the instance using the EC2 launch data (ex: RS_TSS=tss3-1.rightscale.com)
- Collectd auto-detects data sources (disks, processes, etc), starts collecting data, and sends the data every 20 seconds via UDP to the specified monitoring server.
- The monitoring server stores the data in an RRD database.
- When you view a Monitoring tab in the Dashboard for an individual server or deployment, the request is proxied through to the monitoring server.
- The monitoring server produces graphs using the data in the RRD database.
If you have constructed your own monitoring URLs you may need to update them. Previously, you could submit data points through collectd by using the hostname sketchyX-Y.rightscale.com (e.g., sketchy1-1.rightscale.com). Now, with the new TSS-based monitoring system, new instances will be assigned to tssX-Y instead of sketchyX-Y. You will now have to submit your data points to tss-collectdX-Y.rightscale.com. This change does not impact those using our RightScript/Recipe without modification, nor does it impact instances that are still assigned to the legacy sketchyX-Y. Also, see the API 1.5 documentation for information on using the
monitoring_collector_udp attributes of the Instance MediaType to help you manage your monitoring configuration depending on whether you are using HTTP or the collectd UDP protocol.
Can Websites that Scale Way Up View Monitoring Information Too?
Customers whose application scales into the tens, hundreds or even thousands of instances via a scalable server array may wonder if they can still view monitoring data in the RightScale Dashboard. Furthermore, can they view aggregate data from all of the servers? The answer is yes. However, we did have to implement a policy in order to maintain performance while user view their data. Remember that each server instance registers with the load balancers (e.g. HAProxy) as the site scales so that the workload can be evenly balanced between application servers in the array. As an example, you could have 1000 instances registered with two front-end servers running load balancing software. If each registered server instance (1000 in our example) had its own individual set of graphs as well as contributed to an aggregate graph, the amount of data would be overwhelming and take too long to retrieve and graph. Hence, the concept of active and inactive Servers with respect to Monitoring had to be defined. Servers are considered inactive if they have not sent data for a period of one day (or more). The most important points with respect to viewing monitoring data of sites with many servers are:
- Cumulative graphs show the total activity of all servers but provide detailed information only for the active servers (inactive server information is not available)
- Active server graphing data is always available (either in thumbnail graphs or by providing a link to produce the graph on demand)
Monitoring System Topics
- Cluster Monitoring
- Collectd Plugin Apache Log Monitor
- List of Monitored Metrics
- Monitoring Limitations
- Monitoring Error Messages
- Monitoring User Defined Processes
- Setting up collectd
- Supported Graphs Types
- Viewing Graphical Data
- Custom collectd plug-ins