I recently setup a cluster of Linux compute servers, and found myself looking for instrumentation to monitor & control the servers. The following open-source packages give excellent insight and access to the state of the cluster:
- Ganglia – server stats, historical data (check Wikipedia’s cluster)
- Monit – process monitoring/control
- MonALISA – dynamic distributed service architecture