Senior AI Engineer for Critical Production Memory Leak Resolution
Worldwide
We are looking for an experienced AI Engineer to join our team immediately to lead the investigation and resolution of a critical production issue involving a long running persistent memory leak affecting our AI platform. This is a high impact role for someone who has deep experience debugging complex AI systems in production environments. You will work directly with our engineering team to identify the root cause, implement a robust solution, validate the fix under production workloads, and ensure long term platform stability. ## Responsibilities * Investigate and resolve a persistent memory leak in a production AI system. * Perform deep root cause analysis across application code, AI frameworks, runtime environments, and infrastructure. * Profile CPU and memory usage using advanced debugging and performance analysis tools. * Identify memory retention issues, object lifecycle problems, resource leaks, and concurrency related bottlenecks. * Optimize long running AI services for reliability, performance, and efficient resource utilization. * Validate fixes through stress testing and production level workload simulations. * Collaborate closely with backend, infrastructure, and platform engineers. * Document findings, recommendations, and preventive measures to improve long term system reliability. ## Required Experience * Extensive experience building and operating AI systems in production. * Strong expertise with Python and asynchronous programming. * Deep understanding of memory management, garbage collection, object lifecycle, and profiling techniques. * Experience debugging memory leaks in long running services. * Strong knowledge of AI frameworks such as PyTorch, TensorFlow, Hugging Face Transformers, LangChain, or similar technologies. * Experience with containerized environments including Docker and Kubernetes. * Familiarity with Linux performance analysis and production debugging tools. * Experience working with distributed systems, background workers, APIs, and high availability services. * Ability to quickly isolate complex production issues and deliver reliable long term solutions. ## Preferred Qualifications * Experience debugging GPU memory issues and CUDA memory management. * Experience with vector databases, inference servers, and large language model deployments. * Familiarity with observability platforms including Prometheus, Grafana, OpenTelemetry, or similar monitoring solutions. * Experience improving production reliability for enterprise scale AI platforms. ## What Success Looks Like The successful candidate will identify the root cause of the production memory leak, implement a verified long term fix, improve overall system stability and performance, and help establish engineering practices that prevent similar issues in the future. This is a mission critical engagement requiring exceptional debugging skills, production engineering experience, and a disciplined approach to solving complex AI infrastructure problems.
- Less than 30 hrs/weekHourly
- < 1 monthDuration
- ExpertExperience Level
$25.00
-
$50.00
Hourly- Remote Job
- One-time projectProject Type
Skills and Expertise
Activity on this job
- Proposals:Less than 5
- Last viewed by client:2 hours ago
- Hires:1
- Interviewing:0
- Invites sent:0
- Unanswered invites:0
About the client
- PakistanKarachi3:48 PM
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by