URGENT: Experienced System Administrator needed

IT & Networking Network & System Administration Posted 2 years ago


Hours to be determined
Less than 1 month

Expert Level

Start Date

September 24, 2013


We are urgently hiring an experienced System Administrator to help us troubleshoot a serious issue with one of our main web servers, which has recently been failing due to heavy load.

This is a cloud server hosted at DigitalOcean, having 8 CPUs, 160GB SSD Disk and 16GB of RAM, running nginx 1.4.2 with PHP 5.5.3 (served via php-fpm) and MySQL 5.6

Although the traffic has increased drastically in the past week, we still have over 95% of idle CPU and over 1GB free RAM at any given time. However, the php-fpm fastcgi server stops responding at random intervals during peak hours and a manual 'service php-fpm restart' is necessary to get it back online.

We have a hard time identifying whether the bottleneck is MySQL, php-fpm or nginx.

When the issue occurs, the following errors are recorded in the error logs:

WARNING: [pool www] seems busy

PHP Warning:  Error while sending QUERY packet. PID=22771 in ********


[error] 9240#0: *581395 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: *******, server: *******, request: "GET /l.php?id=124 HTTP/1.1", upstream: "fastcgi://", host: "******", referrer: "****"


[error] 9242#0: *582048 connect() failed (111: Connection refused) while connecting to upstream, client: *******, server: ******, request: "GET /l.php?id=50 HTTP/1.1", upstream: "fastcgi://", host: "*******", referrer: "*******"


Our observations are showing that at some point MySQL stops responding to the requests sent to it by PHP-FPM. At this point, more and more PHP-FPM children are spawned by the server, until the pm.max_children limit is finally reached and PHP-FPM stops responding. This happens literally in seconds, and thus the entire system goes down.

We are looking for a very experienced person, who can work on fixing this issue ASAP. If you don't have experience troubleshooting servers that process a couple of million requests per day, please do not apply.

The selected candidate will receive full cooperation from our technical rep, as well as a walkthrough of how the system is currently set up and what troubleshooting attempts have been made so far.

We are looking forward to hearing from you.

  • Other Skills:

Activity on this Job

Last Viewed by Client: 2 years ago

Hired: 1

About the Client

(5.00) 26 reviews

Sofia 08:00 AM

35 Jobs Posted
80% Hire Rate, 1 Open Job

Over $50,000 Total Spent
42 Hires, 10 Active

$24.58/hr Avg Hourly Rate Paid
7,246 Hours

Member Since Dec 25, 2012