Summary: We need to find out why our NFS server will randomly, get slow and disconnect NFS and SSH clients.
Our NFS server has come into some strange problems recently:
- Clients get "Stale File Handle" error message often
- SSH connections are sometimes slow and will drop for no apparent reason
- 4 AWS EC2 instances (All Ubuntu)
- 1 x Varnish server
- 2 x Apache server,
- 1 x NFS server
- Webservers serve content from NFS share
- Varnish server writes backups to NFS share
Steps taken thus far:
- Apache serves "403 Forbidden" message
- "ls -lah" on NFS mount displays "Stale File Handle"
- Restart NFS service on NFS server
- Reboot Apache servers
- Apache -> NFS connection works again ... for awhile.
- Used netstat -tuc to monitor for DDOS attack to explain slowness. Only connections were current SSH session and NFS share.
- Problems started occurring recently after 3 months of working with no problems.
- Due to this issue, NFS server has been removed from infrastructure. Currently, only one Apache server connects to it. Mount has not been reestablished on Varnish server.
- We do not have expertise to know what to look for in log files to explain this behavior.
A successful engagement has the contractor finding and explaining the issues listed above. If the fix for these issues is outside of our areas expertise, the contractor will implement that fix.
Skills: nfs, apache, varnish