We are monitoring your server both at the server and domain level. All metrics are measured at one-minute intervals, and information is retained for 3 months. Metrics at one-hour intervals are retained for up to a year. When most outages are triggered, we store the history of several metrics leading up to the outage, as well as helpful diagnostic information such as: system logins, running applications and network activity. In certain cases, automation is in place to quickly resolve the outage. In addition to responding to monitoring alerts, our engineers are proactively reviewing metrics to further assist improving server performance.
Monitoring at the Server Level
"Bandwidth usage" is the amount of data (per second) transferred between your server and an external resource. We collect and monitor the bandwidth, both incoming and outgoing your server. Bandwidth is tracked by the "individual packets" and the "size of packets". The CloudTech team is alerted when your server exceeds 15K packets or 25 megabytes per second.
If you wish to view your bandwidth usage compared to what’s allocated, feel free to check out this article!
The "load average" measures the computational work the system is performing. An idle server has a load of 0. Every running or waiting process increments your load average. The alerting threshold for this metric may be adjusted based on your server’s configuration.
"CPU usage percentage" is measured by the amount of CPU you have available in relation to the amount of CPU being utilized.
"I/O Wait" is the amount of time that a task must wait to access disk resources.
"Disk space available" is the amount of disk storage (measured in kilobytes) that is currently available. The CloudTech team is automatically notified when your server has less than 2GB available.
An inode (or index node) describes an object, such as a file or a directory on your server. Each server has a maximum amount of usable inodes. Therefore, we track your inode percent usage and CloudTech is alerted when you have exceeded a threshold of 95%.
When these thresholds are met, a disk audit is automatically performed and reviewed. In most cases, this will require action on your part, and we will provide you with information and recommendations to help you reduce your disk usage.
Need to add temporary disk space to your server? Feel free to check out the article below!
Input / Output (I/O)
"Read/Write requests per second" it is the speed of data transfer between the hard disk drive and memory. "Read/Write requests queued per second" is the time spent waiting on I/O.
Memory or Random Access Memory (RAM) is used to temporarily store information for several applications on your server. We monitor the percentage of memory used, cached memory, and percentage of swap memory used. SWAP memory is extra memory on your server that is stored to disk. Cached memory does consume your memory usage, however the server will free this if necessary.
Some key applications that are installed on your server are monitored at the process level. We monitor CPU, Memory, and State for the following processes. If one of these processes are not running when they should be, our team is alerted and automation restarts the following processes:
The CloudTech team is currently in the process of launching MySQL monitoring for both new and existing customers.
A MySQL user is generated on each server to monitor the number of active MySQL connections, queries per second, and slow queries per second. Additionally, "Innodb lock current waits" and "lock time average" is monitored if your database utilizes Innodb.
Monitoring at the Domain Level
Domains that are added to monitoring (up to 5 domains by default), via your (mt) Account Center, will be monitored from several different geographical locations, external to your server. If your website is not available, or takes an extended period of time to load from at least 5 different locations, the CloudTech team will be alerted. Our team will investigate the matter and attempt to mitigate the outage.