Thursday, January 21, 2010

Apache, by default, is set up for worker processes to exit after handling a limited number of requests. This is designed to not allow memory leaks to get out of hand.

Usually, the main Apache process will reap any of its workers that exit. However, on a heavily-loaded server like you describe, the main process may not have enough time available to it to do this reaping. Unreaped (“zombie”) processes show up as in top. In this case, the processes are normal and are not of concern, unless there are many, many unreaped processes.

However, an alternative possibility is that your worker processes are dying abnormally. This can happen if your web application (or its engine) has a bug which causes the worker process to crash. You should look in your Apache error log to see if there are any serious error messages in it.

The below explains what means. If you aren't a Unix system administrator, this may not be of interest to you.

Normally, when a process exits (normally or abnormally), it enters a state known as “zombie” (which in top appears as Z). Its process ID stays in the process table until its parent waits on (or “reaps”) it. Under normal circumstances, when the parent process fully expects its child processes to exit, it sets up a signal handler for SIGCHLD so that, when the signal is sent (upon a child process's exit), the parent process then reaps it at its convenience.

If the parent process has hung for some reason, such as if it's suspended, or is too busy, or is deadlocked, then child processes that exit will not be reaped (until the parent process resumes again). This can cause serious problems if there are many child processes, occupying slots in the process table that will not be freed.

In that case, one solution (if the parent process is unrecoverable, say), is to kill the parent process. Then, the child processes will be reparented to the init process (process ID 1), which will reap them. (If the init process is stalled, then you have much, much bigger problems than child processes not being reaped. In fact, a crashed init process will usually cause a kernel panic.)

No comments:

Post a Comment