Overview
This article covers the process of determining whether you are exceeding resource limits on your DV server. It will also cover basic troubleshooting procedures, and help you determine the appropriate course of action if you are frequently exceeding resource limits. This article assumes that you are familiar with the basics of viewing resource usage on your server, as well as the resource parameters. For more information please review How can I see my server's resource usage statistics?
If you're having trouble with the steps in this article, additional assistance is available via Advanced Support, our premium services division. For more information on what Advanced Support can do for you, please click here.
READ ME FIRST
Please keep in mind that the DV product is a self-administered hosting solution. This article is provided as a courtesy, and the material covered is outside the scope of support provided by Media Temple. Please take a moment to review the Statement of Support.
VSwap
On the DV, the User Beancounters resource allocation system has been replaced with VSwap, which simplifies the process of memory management and troubleshooting. VSwap provides virtualized swap memory to your server if it exceeds it's guaranteed allocation of RAM. Please note that this is different than standard swap memory, because it is not actually written to disk, but it is considered to be equivalent to traditional swap in terms of performance. VSwap is also considered to be more accurate and reliable than the 'burstable' RAM provided through the Beancounters management system.
Plesk Resource Alerts
If you suspect your server may be overusing its resources, check your Resource Alerts. You may also want to install the Server Health Monitor utility provided by Plesk. This provides an interface for easy monitoring of several important aspects of your server.
- Sign into the Plesk Power User Panel as "root". This can be done directly at https://example.com:4643, or:
- Log into the Server Administration Panel.
- Click on Tools & Utilities in the left sidebar.
- Click on Manage Your Container.
Your Power User Panel will open in a new tab or window.
- Click on Resource Alerts on the left.
- If you have encountered any resource alerts recently, you will see errors similar to these:
- Yellow - You are approaching your purchased resource limit.
- Red - You are approaching the current physical machine resource limit.
- Black - You have gone over the current system limit. Indicates a crash.
- Green - Back to normal, safe operating limits.
Check the System resource limits section below for details on what the types of alerts mean.
cPanel Service and Resource Alerts
1. To quickly check the status of monitored services, CPU, and disk usage, log into WHM and click on Service Status.
2. All currently monitored services will be listed. Scroll to the bottom for a report on current CPU usage. You may also use this page to add services to monitoring via the link at the top of the page.
Beancounters
The beancounters utility returns useful information about server usage statistics.
- Log into your server as a root or sudo user via SSH.
- Run this command:
cat /proc/user_beancounters
- You should see output like the following:
Version: 2.5 uid resource held maxheld barrier limit failcnt 30173: kmemsize 4905005 4923306 24577665 27035431 62 lockedpages 0 0 1200 1200 0 privvmpages 51381 51386 451859 471859 3 shmpages 6024 6024 34475 34475 0 dummy 0 0 0 0 0 numproc 42 42 600 600 0 physpages 23564 23566 0 2147483647 0 vmguarpages 0 0 262144 2147483647 0 oomguarpages 25270 25272 262144 2147483647 0 numtcpsock 15 15 600 600 0 numflock 11 11 960 1056 0 numpty 1 1 60 60 0 numsiginfo 0 0 1024 1024 0 tcpsndbuf 225836 225836 5734955 8192555 507 tcprcvbuf 245760 245760 5734955 8192555 0 othersockbuf 12972 12972 2867477 5325077 0 dgramrcvbuf 0 0 2867477 2867477 0 numothersock 16 16 600 600 0 dcachesize 531278 534223 5368543 5529600 0 numfile 2324 2333 20000 20000 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 14 14 200 200 0
What the columns mean:
- uid - The user ID number.
- resource - The type of limit in question. See the next section for details.
- held - The amount of this resource being used now.
- maxheld - The maximum amount of this resource used in the last 15 seconds.
- barrier - The dedicated amount of this resource you have purchased.
- limit - The total available amount of this resource currently free on the physical machine. (i.e. you may get lucky with extra resources.)
- failcnt - The number of times you have exceeded the limit for this resource.
Numbers that indicate a memory size are in bytes.
System resource limits
The most common system resource limits you may reach are described briefly below, along with the most common causes of such overages.
Plesk users: For a detailed technical explanation of each limit, and for parameters not shown below, please review View Resources From The Parallels Power Panel
kmemsize
kmemsize is the kernel memory of the server. This limit is closely tied to your CPU use, and is smaller than your RAM. You can reach your kmemsize limit by:
- Running too many processes on your server at the same time.
- Running just a few CPU-intensive processes.
privvmpages
privvmpages is the RAM of the server. You can reach your privvmpages limit by:
- Running processes that require intensive server memory.
Tracking down memory-intense processes via SSH
Viewing Current Memory Usage
The most accurate way to view current memory utilization is via SSH, using the 'free' command:
CT-101765-bash-4.1# free -m
total used free shared buffers cached
Mem: 1024 1015 8 0 0 0
-/+ buffers/cache: 1015 8
Swap: 1024 462 561
The '-m' flag makes the 'free' command show it's output in MB. The 'Mem' line shows how much guaranteed memory is currently being used. In this case, we are using almost all of it (1015 out of 1024MB). The 'Swap' line shows how much VSwap memory is being utilized. Since we have exhausted our guaranteed memory, we are also using some of our VSwap memory (462 out of 1024 MB).
Finding The Processes
Now you know that you have one or more processes running on your server that are using too much of your memory. There are multiple tools that can be utilized via SSH to track down these troublesome processes.
TOP
- Run:
top M
- You should see dynamic output like this:
top - 13:19:52 up 8 days, 21:34, 0 users, load average: 0.00, 0.00, 0.00 Tasks: 62 total, 1 running, 59 sleeping, 2 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1024.000M total, 424.734M used, 599.266M free, 0.000k buffers Swap: 1024.000M total, 433.828M used, 590.172M free, 3416.000k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1084 mysql 20 0 2036m 13m 3404 S 0.3 1.3 14:22.98 mysqld 31647 root 20 0 386m 7756 7708 S 0.0 0.7 0:01.02 httpd 1919 root 20 0 234m 3676 2216 S 0.0 0.4 0:06.04 spamd 30592 named 20 0 765m 2856 2544 S 0.0 0.3 0:00.21 named 31745 nginx 20 0 63368 2068 1684 S 0.0 0.2 0:01.17 nginx 32332 postfix 20 0 51916 2048 1932 S 0.0 0.2 0:03.05 qmgr 32663 postfix 20 0 51716 1976 1896 S 0.0 0.2 0:00.00 tlsmgr 32320 root 20 0 51656 1944 1848 S 0.0 0.2 0:03.91 master 2290 postfix 20 0 51716 1832 1772 S 0.0 0.2 0:00.01 pickup 2296 root 20 0 105m 1756 1536 S 0.0 0.2 0:00.02 bash 32642 sw-cp-se 20 0 60680 1684 1680 S 0.0 0.2 0:00.02 sw-cp-serverd 32138 postfix 20 0 394m 1272 1204 S 0.0 0.1 0:00.67 psa-pc-remote 2378 root 20 0 15000 1268 1012 R 0.0 0.1 0:00.03 top 29446 root 20 0 22096 748 744 S 0.0 0.1 0:00.00 xinetd 475 root 20 0 179m 728 448 S 0.0 0.1 0:04.68 rsyslogd
top - 15:43:35 up 443 days, 3:53, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 34 total, 1 running, 33 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 1807436k total, 205592k used, 1601844k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 16 0 1980 644 556 S 0 0.0 0:33.59 init
24158 root 15 0 1644 564 472 S 0 0.0 1:53.25 syslogd
24181 root 16 0 6912 1048 672 S 0 0.1 1:59.54 sshd
24196 root 16 0 2640 896 724 S 0 0.0 0:00.64 xinetd
24216 root 15 0 5768 820 560 S 0 0.0 0:00.08 couriertcpd
24218 root 16 0 4608 1064 836 S 0 0.1 0:00.02 courierlogger
24366 root 17 0 2368 1132 976 S 0 0.1 0:00.00 mysqld_safe
24426 mysql 16 0 119m 23m 5348 S 0 1.3 8:51.08 mysqld
24476 root 16 0 32396 29m 2340 S 0 1.7 3:08.59 spamd
24478 popuser 16 0 32396 27m 968 S 0 1.6 0:06.28 spamd
24501 root 16 0 43648 6804 4192 S 0 0.4 0:00.77 httpsd
24556 root 18 0 5396 688 424 S 0 0.0 0:00.00 saslauthd
24557 root 18 0 5396 432 168 S 0 0.0 0:00.00 saslauthd
21912 qmails 16 0 1636 488 396 S 0 0.0 0:00.03 qmail-send
21914 qmaill 16 0 1580 464 396 S 0 0.0 0:00.00 splogger
21916 root 17 0 1616 372 280 S 0 0.0 0:00.00 qmail-lspawn
21917 qmailr 16 0 1604 376 288 S 0 0.0 0:00.00 qmail-rspawn
21918 qmailq 16 0 1572 344 280 S 0 0.0 0:00.00 qmail-clean
10049 psaadm 16 0 47232 20m 14m S 0 1.2 0:02.95 httpsd
11518 psaadm 16 0 48164 21m 14m S 0 1.2 0:02.21 httpsd
25766 root 16 0 35824 14m 8388 S 0 0.8 0:00.15 httpd
Important columns:
- PID - Shows process ID number.
- USER - Shows owner of the process, useful for identifying hacks.
- S - Watch out for zombie processes, marked with a Z - these processes have not been properly closed by the program that started them.
- %CPU - Shows server CPU percentage used.
- %MEM - Shows server RAM percentage used.
- TIME+ - Shows how long the process has been running.
- COMMAND - Shows the daemon running the process. Useful for identifying the general system service that is being a resource hog.
Tasks are sorted by CPU use percentage by default. Type SHIFT-M to sort by memory, and SHIFT-P to switch back to sort by CPU percentage. Memory is helpful if you are getting privvmpages errors, and CPU is helpful if you are getting kmemsize errors.
When you are done with top, type CTRL-C to exit. See the top manual for more top commands.
PS
In certain situations, it is advantageous to use the 'ps' command as opposed to 'top.' The 'ps' command has numerous options, the entirety of which are covered in the man page (type 'man ps' on the command line). For our purposes, we will use the 'fauxx' flags, which will show all running processes in a process tree:
ps fauxx
You should see output like the following
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 10372 752 ? Ss Feb06 0:15 init [3]
root 1412 0.0 0.0 5924 624 ? Ss Feb06 0:02 syslogd -m 0
dbus 1421 0.0 0.0 21276 1064 ? Ss Feb06 0:00 dbus-daemon --system
root 1930 0.0 0.0 20888 1184 ? Ss Feb06 0:04 crond
root 22304 0.0 0.0 12800 788 ? S<s Feb06 0:00 /sbin/udevd -d
root 22224 0.0 0.0 10788 1344 ? S Feb14 0:00 /bin/sh /usr/bin/mysqld_safe
mysql 22421 0.0 3.7 522976 70492 ? Sl Feb14 8:42 \_ /usr/libexec/mysqld
root 23576 0.0 0.0 21668 976 ? Ss Feb14 0:01 xinetd -stayalive -pidfile /var/run/xinetd.pid
qmails 28232 0.0 0.0 3868 472 ? S Feb14 0:00 qmail-send
qmaill 28234 0.0 0.0 3820 560 ? S Feb14 0:00 \_ splogger qmail
root 28235 0.0 0.0 3860 440 ? S Feb14 0:00 \_ qmail-lspawn | /usr/bin/deliverquota ./Maildir
qmailr 28236 0.0 0.0 3860 444 ? S Feb14 0:00 \_ qmail-rspawn
qmailq 28237 0.0 0.0 3816 412 ? S Feb14 0:00 \_ qmail-clean
500 28301 0.0 0.1 60908 3508 ? S Feb14 0:08 /usr/sbin/sw-cp-serverd -f /etc/sw-cp-server/config
root 5782 0.0 0.0 10440 372 ? Ss Feb28 0:00 vzctl: pts/1
root 5783 0.0 0.0 12088 1720 pts/1 Ss+ Feb28 0:00 \_ -bash
root 8102 0.0 1.2 349928 22724 ? Ss Feb28 0:00 /usr/sbin/httpd
apache 8104 0.0 0.4 195296 8296 ? S Feb28 0:00 \_ /usr/sbin/httpd
apache 8106 0.0 0.8 350952 15344 ? S Feb28 0:00 \_ /usr/sbin/httpd
apache 8176 0.0 0.8 350952 15248 ? S Feb28 0:00 \_ /usr/sbin/httpd
apache 9237 0.0 0.7 350952 14952 ? S 00:38 0:00 \_ /usr/sbin/httpd
apache 9241 0.0 0.7 350952 15040 ? S 00:38 0:00 \_ /usr/sbin/httpd
root 9255 0.0 0.0 10440 368 ? Ss 00:51 0:00 vzctl: pts/0
root 9256 0.0 0.0 12084 1688 pts/0 Ss+ 00:51 0:00 \_ -bash
root 9293 0.0 0.1 35788 2188 ? Ss 00:51 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 9294 0.0 0.1 35792 2744 ? S 00:51 0:00 \_ nginx: worker process
nginx 9295 0.0 0.1 35792 2564 ? S 00:51 0:00 \_ nginx: cache manager process
Similar output can be obtained in TOP by pressing 'c.' This will show you the full path to the command that started each process. PS is helpful because it simply prints the information we require, so that we can quickly utilize that information if necessary.
Handy command to view the top 10 memory-hogging processes using PS:
ps auxx | sort -nk +4 -r | head
Examining The Processes
- Using the information obtained by TOP or PS, identify which system service(s) are using a high percentage of CPU-time or memory. For example, httpd is Apache, and mysqld is MySQL. This may be enough to identify the exact cause of your problem. For example, if MySQL is the culprit, you can now check your running MySQL queries to see if any of them are extremely inefficient.
Handy MySQL command to view live queries:
watch "mysqladmin -u admin -p'`cat /etc/psa/.psa.shadow`' processlist"
If you received generic results from this step, such as the result that Apache is the resource hog, you can use a few more commands to drill deeper into the processes. Continue with the following steps as appropriate.
- If you suspect you may be hacked, type c in TOP, or run 'ps aux' to view the command that started each current process. If you notice something suspicious, such as a process with USER apache that was not initiated by COMMAND /usr/sbin/httpd, investigate the script listed in the COMMAND section.
- Note the PID number for a process that is using a high percentage of your resources, for example, 24158 for the syslogd process shown in the TOP output above. Exit top with CTRL-C, or use PS to obtain the PID. Now execute this command, replacing 24158 with your own PID:
lsof -p 24158
You should see output like this:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME syslogd 24158 root cwd DIR 0,201 4096 8857888 / syslogd 24158 root rtd DIR 0,201 4096 8857888 / syslogd 24158 root txt REG 0,201 35832 8858116 /sbin/syslogd syslogd 24158 root mem REG 0,201 46680 16525660 /lib/libnss_files-2.5.so syslogd 24158 root mem REG 0,201 1594552 16525628 /lib/libc-2.5.so syslogd 24158 root mem REG 0,201 124432 16525614 /lib/ld-2.5.so syslogd 24158 root 0u unix 0x26484b80 358292464 /dev/log syslogd 24158 root 2w REG 0,201 39610 14951126 /var/log/messages syslogd 24158 root 3w REG 0,201 3009832 14951128 /var/log/secure syslogd 24158 root 4w REG 0,201 44906 49676428 /usr/local/psa/var/log/maillog syslogd 24158 root 5w REG 0,201 53710 14951614 /var/log/cron syslogd 24158 root 6w REG 0,201 0 14951132 /var/log/spooler syslogd 24158 root 7w REG 0,201 0 14951612 /var/log/boot.log
This shows all of the files currently opened by this process. This kind of output is particularly helpful if you are trying to track down web scripts associated with resource-heavy Apache processes.
Run lsof -p as quickly as possible. Many processes last for only a few seconds, and if you don't catch one while it's running, you'll only see the log files as output.
- Alternately, you can make a note of the PID as described in the step above, then execute the following commands to view more information about the process:
- Exit top with CTRL-C.
- List the contents of the process folder, replacing 24158 with your own PID:
ls -la /proc/24158
- You should see output similar to the following:
total 0 dr-xr-xr-x 3 root root 0 Aug 19 14:10 . dr-xr-xr-x 2485 root root 0 Jun 2 2009 .. -r-------- 1 root root 0 Aug 19 16:36 auxv -r--r--r-- 1 root root 0 Aug 19 16:00 cmdline lrwxrwxrwx 1 root root 0 Aug 19 16:19 cwd -> / -r-------- 1 root root 0 Aug 19 16:36 environ lrwxrwxrwx 1 root root 0 Aug 19 16:19 exe -> /sbin/syslogd dr-x------ 2 root root 0 Aug 19 16:19 fd -r--r--r-- 1 root root 0 Aug 19 16:19 maps -rw------- 1 root root 0 Aug 19 16:36 mem -r--r--r-- 1 root root 0 Aug 19 16:36 mounts -r-------- 1 root root 0 Aug 19 16:36 mountstats lrwxrwxrwx 1 root root 0 Aug 19 16:19 root -> / -r-------- 1 root root 0 Aug 19 16:36 smaps -r--r--r-- 1 root root 0 Aug 19 16:00 stat -r--r--r-- 1 root root 0 Aug 19 16:39 statm -r--r--r-- 1 root root 0 Aug 19 16:36 status dr-xr-xr-x 3 root root 0 Aug 19 16:36 task -r--r--r-- 1 root root 0 Aug 19 16:36 wchan
Again, doing this as quickly as possible will give you the best results. Of particular use is the cwd, which shows the directory that the current process is working in.
- If these steps do not yield useful information, you will need to work with your system administrator to identify the cause of your issue in more detail.
Tracking down processes hogging other resource types
tcpsndbuf
tcpsndbuf is the number of connections being used to serve data, which strongly correlates to the amount of data being served by Apache. You can reach your tcpsndbuf limit by:
- Using Apache to serve too much content to too many people at the same time.
The best way to determine why you are reaching this particular resource limit is to look at the traffic on all of your sites, especially which pages and files are being requested the most. For example, if hundreds of people are trying to view the same streaming video within a minute of each other, it's probably this item that is causing you to reach the limit. Of course, it's possible that the requests are spread out over many different items as well, which will make this more difficult to analyze. You can check your traffic and web requests by viewing your web statistics. Depending on the issue at hand, optimizing Apache, or utilizing a CDN could help resolve overages of this parameter.
othersockbuf and numothersock
othersockbuf and numothersock refer to your server's internal "sockets." Sockets are what processes internal to the server use to talk to one another. For example, MySQL and PHP talk to each other through a socket. "othersockbuf" also includes some processes that go through the internet - it is the total size of UNIX-domain socket buffers, UDP, and other datagram protocol send buffers. You can reach your othersockbuf or numothersock limit by:
- Running too many internal processes that have to connect to each other.
Run TOP or PS to view your current running processes.
numfile
numfile is the number of files open on your server concurrently. You can reach your numfile limit by:
- Opening too many files at once.
Run this command in SSH to view the number of open files:
lsof | wc -l
You can also specify a specific process using the '-p' flag. Get The PID using TOP or PS, and then run the following command, replacing 'pid' with the PID you obtained:
lsof -p pid
Common culprits
High traffic
Your server may be optimized perfectly, but it may still be receiving more traffic than it can handle. This is especially likely if you don't notice any single process using a lot of your resources, but you see dozens of processes open at once. Check your website statistics to see if you're getting unusually high traffic, then consider upgrading your server - see the Upgrade sub-section the suggested Solutions below.
Poorly-written software
If you notice any of the following:
- Zombie processes when you run top - check for the letter Z in the S column.
- Sudden performance drop after installing or upgrading a new software package.
- MySQL queries that run for longer than 2 seconds.
- A cluttered list of themes and plugins for your content management system, such as WordPress.
You may be suffering from poorly-written software or a bad combination of software components. Contact your software developer for further assistance with custom-written software, or check your software's help forums for help with third-party components. Note that poor software may still be the cause of your issue even if you have none of the above symptoms.
Hacks
Your server may be compromised. The high memory use could be due to sending out lots of spam, forwarding large amounts of traffic, or running rogue processes. See this collection of articles for detailed information on investigating and resolving hacked server scenarios:
Runaway processes
If this is an out-of-the-blue occurrence for your server, you may simply need to kill off a runaway process. Occasionally even well-written code can spawn a process that gets stuck for some reason. See the Quick fix - reboot section above for instructions on rebooting your server or restarting a process.
Solutions
Eliminate resource intense processes where possible
If you have identified a specific script or piece of software or even a single MySQL query, that is causing your memory over-use, you should remove or optimize the offending code.
You may need to work with a professional software developer or system administrator to do this effectively.
Check For Errors
When things are not functioning as anticipated, it is always a good idea to check both your general errors logs, as well as domain specific error logs. See System Paths and Checking error logs for their locations.
Optimize
Here is our general article on optimizing your server. It contains links to articles that will walk you through the process of analyzing and tuning Apache and MySQL:
Eliminate unneeded services
If you aren't using a particular service, shut it down.
- For example, if you use Google Apps for email, you can turn off Qmail, SpamAssassin, and the IMAP/POP services. Named, which is used to operate private nameservers, is often not required, and can be shut down. For more information regarding how to start/stop services, please review the following KnowledgeBase article: Restarting Services In Plesk
Upgrade
If you're happy with how everything on your server runs - you just need more resources - you can upgrade your server plan. Options include the following:
- Purchase a second DV server. Many customers run MySQL from one server and Apache from another server.
Monitor and maintain
Keep an eye on your server so you can quickly respond to small resource issues before they escalate.
- Install monitoring software on your server, such as Monit. This can alert you of resource overages as soon as they occur. Here are some other monitoring options:
- VPS Info - http://www.labradordata.ca/home/13
- Status2k - http://status2k.com/
- LoadAVG - http://www.silversoft.com/loadavg
- Memory Utilization Script - http://wiki.vpslink.com/index.php?title=Memory_Utilization_Script
- Note that Watchdog tends to be overzealous in reporting problems. You may want to use one of the other suggested programs.
- Become more familiar with SSH monitoring commands. Learn commands like top, free, cat /proc/user_beancounters, and ps auxx to view server resource usage in realtime.
- top - See the section on top above.
- free - Shows memory and swap space breakdown. http://www.linfo.org/free.html
- cat /proc/user_beancounters - See Troubleshooting DV Memory Issues .
-
- ps auxww - View all processes on your system.
- Check your system logs. See System Paths and Checking error logs for their locations.
- Check out one (mt) Media Temple customer's monitoring solution:http://davidseah.com/blog/comments/monitoring-my-media-temple-dv-base-memory-usage/
- Track server uptime externally using Pingdom or Down for everyone or just me?.
- Maintain up-to-date backups of your server so you can revert your server to a working state in the case of a hack or irreversible configuration issue.
Comments