Search Freelance Talent on Upwork

Munish G.

Ex-Google Cloud/Workspace | 10x GCP & Multi-Cloud Expert

United States

$195/hr

100% Job Success

$100K+ earned

Offers consultations

Google-Grade Reliability for Your Cloud Infrastructure. GCP 10x Certified, AWS 1x Certified, Databricks 1X certified Senior Infrastructure Architect and former Google SRE (11 years) and Oracle Engineer with over 25 years of experience managing global-scale production environments. If you are looking for a "pair of hands" to write scripts, there are many options. If you are looking for an Owner-Architect to design, secure, and scale your business-critical infrastructure with zero-risk execution, let’s talk. With 10x GCP Professional Certifications, I specialize in taking complex, high-traffic ecosystems and making them resilient, cost-effective, and simple to manage. Expertise Across the Stack: Cloud Architecture: Expert-level design and migration across GCP, AWS, and Hybrid Cloud. DevOps & SRE: Implementing "True SRE" principles—SLOs, error budgets, and automated toil reduction. Infrastructure as Code (IaC): Building reproducible, version-controlled environments using Terraform, Helm, Ansible, and Packer. Kubernetes & Orchestration: Deep experience with GKE, EKS, and self-managed clusters (Rancher). Security & Compliance: Hardening infrastructure via VPC Service Controls, Zero-Trust IAM, and automated security pipelines. CI/CD Optimization: Streamlining delivery via Jenkins, GitHub Actions, and Google Cloud Build. Why Work With Me? The "Google" Standard: I apply the same rigor and reliability standards I used at Google to your infrastructure, ensuring 99.9% availability. Cost Efficiency: I specialize in cloud cost optimization, often saving my clients more in monthly cloud spend than the cost of my engagement. Simplicity at Scale: I believe the best architecture is the simplest one that solves the problem. I build systems that your team can actually maintain. Technical Proficiencies: Linux: 20+ years of deep systems-level expertise (RHEL, Debian, Ubuntu). Automation: Puppet, Ansible, Terraform. CI/CD: Jenkins, Artifactory, GitHub, Cloud Build. Containers: Docker, Kubernetes, Container Registry.

Munish G. has worked .

Alex L.

DevOps and SRE/Support Lead - Platform Engineering - AWS, Azure, GCP

Hungary

$35/hr

100% Job Success

$1M+ earned

📌 About Us Led by a DevOps and SRE veteran with 20+ years of industry experience, we are a powerhouse team of 100+ top-tier DevOps, SecOps, Cloud, and Site Reliability Engineers. We specialize in designing, building, and managing infrastructures that empower innovation while guaranteeing stability, security, and cost-efficiency. With over 1,000 successful projects delivered for startups, enterprises, and global brands across fintech, e-commerce, health tech, media, and AI/ML, we take end-to-end ownership—from initial architecture design to 24x7x365 L1/L2/L3 support. Core Capabilities & Business Impact 1. Platform Engineering, DevOps & CloudOps We build multicloud architectures and scalable systems that enable daily releases through automated "golden paths" and predictable environments. Infrastructure as Code (IaC) & Automation: Terraform, Pulumi, CloudFormation, Ansible, Chef, Puppet. Containerization & Orchestration: Kubernetes, Docker, EKS, Rancher. CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, CircleCI, Argo CD, Flux. Web & Database Tuning: LAMP and LEMP stacks setup, database tuning, and high-availability website speed optimization. 2. 24/7 Site Reliability Engineering (SRE) & IT Support We provide true 24x7x365 L1, L2, and L3 support for applications, servers, and end-users, ensuring SLA/SLO-driven operations. Reliability & Incident Management: On-call rotations, escalation trees, postmortems, root cause analysis, and error budgets. Observability: Transparent SLI/SLO dashboards using Prometheus, Grafana, OpenTelemetry, ELK, Datadog, and New Relic. IT Consultancy & Workstation Support: End-user maintenance (B2B/B2C), patching, updates, backup management, and policy enforcement via NinjaOne. Email Deliverability: DNS, DKIM, SPF, DMARC, and MX configuration. 3. FinOps & Cloud Cost Optimization We turn infrastructure into a strategic advantage by reducing cloud costs by 20–60%. Cost Control: Budgeting, forecasting, rightsizing, cost governance, and waste reduction on AWS, Azure, and GCP. FinOps Tooling: CloudHealth, AWS Cost Explorer, GCP Billing. 4. Security, Networking & Compliance We ensure robust security postures and continuous compliance readiness. Security & Identity: Entra ID, HashiCorp Vault, SOPS, RBAC audits, WAF/Shield. Policy & Compliance: Policy-as-Code (OPA/Gatekeeper), CIS Benchmarks, SOC 2, and ISO 27001 readiness. Network Hardware & Integrations: CISCO remote networks, Mikrotik remote devices, FortiGate, and QNAP configuration. Server Hardening: Deep optimization and security hardening for Linux, UNIX, and Windows Server systems. 5. MLOps & AI Infrastructure We partner with ML engineers to build reliable, reproducible production workflows. ML Pipelines: MLflow, Kubeflow, SageMaker. Infrastructure: GPU-enabled infrastructure setups, data versioning, and model tracking. Comprehensive Technology Stack - Cloud Providers: AWS (Certified Partner), Microsoft Azure (Certified Partner), GCP, OCI. - Operating Systems: Linux (20+ years expertise), UNIX, Windows Server. - Containers & Orchestration: EKS, Kubernetes, Docker, Rancher. - IaC & Configuration: Terraform, Pulumi, CloudFormation, Ansible, Chef, Puppet. - CI/CD & GitOps: Argo CD, Flux, GitHub Actions, GitLab CI, Jenkins, CircleCI. - Observability & Monitoring: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog, New Relic. - Security & Policy: OPA, Gatekeeper, Vault, SOPS, Entra ID, WAF/Shield. - FinOps & MLOps: CloudHealth, AWS Cost Explorer, GCP Billing, MLflow, Kubeflow, SageMaker. - Networking & IT Management: FortiGate, CISCO, Mikrotik, QNAP, NinjaOne. - Web & Email: LAMP, LEMP, DNS, DKIM, SPF, DMARC, MX. Who We Work With Startups: Building scalable, secure infrastructures from day one. Enterprises: Driving modernization, automation, and DevOps transformations. Finance & SaaS: Implementing rigorous security, compliance, and FinOps practices. ML/AI Teams: Deploying robust frameworks for model scaling. Ready to transform your infrastructure, optimize costs, and secure 24/7 reliability? Let’s talk.

Associated with

NIX

$20M+

earned

Vadym A.

Senior DevOps / SRE Engineer - AWS, Azure, GCP | Cloud & Hybrid

Spain

$70/hr

100% Job Success

$200K+ earned

I’m a Senior DevOps and Site Reliability Engineer with over 14 years of experience designing, automating, and maintaining scalable, scalable, secure, and highly available cloud infrastructures across AWS, Azure, and GCP, as well as hybrid on-prem environments. I help companies in e-commerce, fintech, and marketing technology migrate to the cloud, build robust CI/CD pipelines, and implement SRE-driven monitoring and automation. My focus is delivering measurable impact - improving reliability, accelerating delivery, and driving efficiency through automation, infrastructure-as-code, and cost optimization. Core Expertise • Cloud Platforms - AWS, Azure, GCP • Infrastructure as Code - Terraform, Ansible, Helm, CloudFormation • Containers & Orchestration - Docker, Kubernetes (EKS / AKS / GKE), Kustomize • CI/CD Automation - Jenkins, GitLab CI, GitHub Actions, Bitbucket Pipelines, ArgoCD • Monitoring & Observability - Prometheus, Grafana, ELK Stack, DataDog, New Relic • Security & Compliance - IAM, encryption, cost and access governance across multi-cloud and hybrid systems Certified AWS Solutions Architect and Certified Kubernetes Administrator. Let’s connect to discuss how I can help you build reliable infrastructure, speed up delivery, and reduce operational costs.

Vadym A. has worked .

Kashif N.

Automation Agent Developer, DevOps Engineer | Lindy, Agentic, K8s

Pakistan

$30/hr

93% Job Success

Available now

Offers consultations

I build production AI agents that actually do the work: voice agents that book appointments, RAG agents that read your docs, browser agents that fill forms, and multi-agent systems that coordinate across tools, then I deploy them on infrastructure that holds up under real load. Most "AI integrations" look great in a demo and fall apart in production. Hardcoded timestamps, brittle prompts, missing edge cases, they ship, then quietly fail at 3 AM. Mine don't. I catch that before launch, not after a client complaint. 8 years in software, 29 jobs delivered on Upwork, 93% Job Success, $20K+ earned. Recent example: a Vapi appointment-booking agent for a wellness clinic where I caught and patched production bugs, hardcoded timestamps, test placeholders, before go-live. 🎯 What I build ▸ AI Voice Agents (Vapi, Retell, Bland): Inbound and outbound calling for appointment booking, sales qualification, and support. Custom tools for slot availability, date validation, CRM writebacks, and graceful handoff to humans. ▸ Agentic RAG & LLM Integrations: Production RAG pipelines over PDFs, product catalogs, and internal docs. Tool use, memory, and routing across OpenAI, Claude, Gemini, and LangChain. Local models (Ollama) for data-residency requirements, plus MCP server integrations. ▸ Workflow Automation (n8n, Make, Zapier): Lead routing, document generation, system-to-system sync, and multi-step approval chains. If your team copies data between tools or regenerates the same documents repeatedly, that workflow is replaceable: I scope, build, and document it in days. ▸ Multi-Agent Systems (LangGraph, CrewAI, AutoGen): Orchestrated workflows where specialized agents handle research, writing, validation, and escalation. State machines, shared memory, retry logic, and human-in-the-loop checkpoints. ▸ Browser & Computer-Use Agents: Agents that navigate web apps, fill forms, and complete multi-step workflows. Built for tools that don't have good APIs. ▸ DevOps & Deployment (AWS, Docker, CI/CD): Building the agent is half the work. I deploy on AWS (EC2, ECS, Lambda), containerize with Docker, set up CI/CD, and add observability (logging, prompt tracing, cost monitoring) so your AI doesn't ship and silently fail. 📊 How I work ✔ Scoped proposal within 24 hours of a discovery call ✔ Weekly demo cycles, not month-long silence ✔ Production-grade code: logging, error handling, edge-case coverage ✔ Fixed-scope quotes, no surprise hourly creep ✔ Documentation your in-house team can maintain ⭐ Recent client feedback "Excellent experience on a complex Agentic RAG system. Strong expertise in building production-grade AI solutions integrating product catalogs, PDFs, and backend systems." 📞 Next step Send me the problem you're trying to solve, not a feature spec. I'll come back with a scoped approach, fixed price, and timeline. If we're a fit, we start. If not, you walk away with a clearer plan than you came in with.

Aamir S.

Cloud Architect | DevOps | Platform Engineering | Full Stack Engineer

Pakistan

$35/hr

100% Job Success

Available now

I'm a results-driven Full-Stack Developer and DevOps Engineer with over 8 years of experience in architecting, developing, and deploying high-performance web and mobile applications, and 6+ years of deep hands-on expertise in DevOps, cloud infrastructure, and automation. Whether you're a startup, mid-sized business, or enterprise, I deliver end-to-end solutions that are scalable, secure, and optimized for performance. From cloud-native development and infrastructure automation to AI-powered applications and seamless CI/CD pipelines — I transform complex technical requirements into robust, business-ready systems. 💻 Full-Stack Development Front-End: React.js, Next.js, Vue.js, Angular, Redux, RxJS, Electron Back-End: Node.js, Express.js, Python (Django, FastAPI), Ruby on Rails Databases: PostgreSQL, MySQL, MongoDB, Redis, ElasticSearch, Oracle API & Integration: RESTful & GraphQL APIs, SOAP, Webhooks, Data Warehousing DevOps | Platform Engineering | Cloud Architect Cloud & Infrastructure AWS, Azure, GCP • Terraform, Terragrunt, AWS CDK, Pulumi, CloudFormation, Crossplane • Infrastructure as Code (IaC) Platform Engineering & Kubernetes Kubernetes (EKS, AKS, OpenShift), Docker, ECS • GitOps (ArgoCD, FluxCD) • Internal Developer Platforms (IDP) • Multi-Cluster Operations CI/CD & Automation GitHub Actions, GitLab CI, Jenkins, CircleCI, AWS CodePipeline • Release Automation • Deployment Strategies • DevSecOps Site Reliability Engineering (SRE) High Availability • Incident Management • Reliability Engineering • Capacity Planning • Performance Optimization Observability & Monitoring Prometheus, Grafana, OpenTelemetry, Datadog, New Relic, ELK/EFK • Distributed Tracing • Centralized Logging Security & Configuration Management HashiCorp Vault, AWS Parameter Store, Secrets Management • Ansible, Chef • IAM & Cloud Security Best Practices AI Infrastructure & MLOps LLM Infrastructure • Generative AI • Model Serving • Kubeflow • MLflow • KServe • Vector Databases • AI/ML Platform Engineering Distributed Systems & FinOps Scalable Cloud Architectures • Cost Optimization • FinOps • Multi-Region Deployments • Disaster Recovery Certifications • AWS Certified Solutions Architect • HashiCorp Certified Terraform Associate Terraform Certified – AWS Cloud Economics AWS Technical Professional AWS Business Professional Why Work With Me? ✔ End-to-End Ownership — Architecture, Automation, Security, and Operations ✔ Cloud-Native Expertise — Building scalable and resilient platforms on AWS, Azure, and GCP ✔ Platform & SRE Focus — Improving developer productivity, reliability, and operational excellence ✔ AI & MLOps Experience — Deploying and managing modern AI/ML workloads at scale ✔ Clear Communication — Reliable delivery with transparent collaboration 📩 Let’s collaborate and bring your vision to life with innovation, reliability, and technical excellence.

Sungyoon K.

SRE / DevOps Engineer / Grafana OSS Contributor / AWS Kubernetes

South Korea

$25/hr

69% Job Success

$300+ earned

I help companies run stable production systems on AWS. You will get production-ready infrastructure, reliable backend systems, and operational environments your team can run and maintain independently — not just working code. I’m an AWS Certified Solutions Architect with hands-on experience operating backend and infrastructure systems serving over 4 million daily active users in production. I also contribute to Grafana OSS through production-driven monitoring and observability improvements. My background in backend development and Site Reliability Engineering allows me to design systems that are scalable, observable, and easy to operate. I focus on building infrastructure and backend systems that handle real production traffic — with strong reliability, automation, and operational simplicity. ⸻ What You Will Get ✔ Production-ready AWS infrastructure (EC2, EKS, VPC, IAM, RDS, S3) ✔ Scalable Kubernetes deployment and operations setup ✔ Reproducible infrastructure using Terraform / CDK ✔ Automated CI/CD pipelines for safe deployments ✔ Monitoring, alerting, and observability platforms (Grafana, Prometheus, Loki) ✔ Centralized logging and system visibility ✔ Performance optimization and production troubleshooting ✔ Cloud cost optimization and infrastructure review ✔ Secure backend API systems (Node.js / NestJS) ✔ Optimized database performance (PostgreSQL, Redis) ✔ Architecture documentation and operational guidelines ✔ Deployment and operational runbooks for your team All deliverables include documentation, infrastructure code, and operational guidance so your team can operate the system independently. ⸻ Production & Reliability Focus • Production incident response and system stabilization • Monitoring and alerting strategy design • Performance bottleneck analysis • Infrastructure automation and standardization • Systems designed for reproducibility and operational simplicity I build systems that are reliable in production — not just working in development. ⸻ Technology Stack AWS, Kubernetes, Terraform, Docker, CI/CD, Infrastructure, Monitoring, Observability, Grafana, Prometheus, Loki, Node.js, PostgreSQL, Redis

Vikas Y.

Devops Engineer + Python Expert | Kubernetes | Software Architect

India

$60/hr

100% Job Success

$200K+ earned

Offers consultations

Helping startups and enterprises build rock-solid cloud infrastructure. | 🚀 DevOps | Python | 8+ years of Experience | Ex-LinkedIn | Kubernetes, Docker, AWS and GCP. Clients hire me when they need someone who can think like an engineer, build like a developer, and scale like a DevOps pro. I'm not just a DevOps Engineer — I'm also a full-fledged Software Engineer. 🔹 8+ years of experience — LinkedIn, startups, and consulting. 🔹 Kubernetes, Docker, Terraform, Ansible, AWS, GCP 🔹 Python: backend development, automation, API integrations, data pipelines 🔹 CI/CD pipelines, Infrastructure as Code, Monitoring (Prometheus, Grafana) ✅ Contributor to open-source projects ✅ Blogger (kubeblogs.com) Let's work together to make your systems faster, smarter, and more reliable. 🤝

Vikas Y. has worked .

Timothy K.

AWS DevOps | SRE | Serverless

United States

$50/hr

86% Job Success

$20K+ earned

Hello, my name is Timothy. I'm a AWS DevOps Engineer. I specialize in Serverless applications on Amazon Web Services (AWS). My expertise is centered around core AWS services such as Lambda, Amazon ECS (*on Fargate or EC2), DynamoDB, API-Gateway, CloudFront, Amazon Route 53, Fastly CDN, and the many other services that compliment them to enable a production-ready environment. Recent Projects: ⦁ Three-Tier Network VPC on AWS ⦁ Synapse Matrix server deployment with Docker and EC2 (Video & Audio calls enabled) ⦁ Migration from Vercel to Amazon ECS (on EC2) ⦁ Fastly and AWS log & metric aggregation with Datadog ⦁ Datadog SIEM ⦁ Custom AWS Lambda Authorizer ⦁ Fastly CDN integration ⦁ Custom Fastly log aggregation and parsing with AWS (Lambda, DynamoDB, Firehose, etc.) ⦁ Application deployment on Amazon ECS (on Fargate) ⦁ CI/CD from GitHub to Amazon ECS with semantic versioning ⦁ AWS Secrets Manager and Parameter Store for Lambda and Amazon ECS parameters ⦁ RSS Aggregator

Ali B.

Observability & SRE Datadog | Dynatrace | Grafana | New Relic

Spain

$65/hr

83% Job Success

$70K+ earned

Offers consultations

🥇 Top Rated Contractor 🥇 95% Job Score 🥇 Ex-employee of top companies: Gartner & Dynatrace. 🥇 Certified of top tools like Dynatrace & Datadog. 🥇 Top Skills: Cloud Monitoring, Dynatrace, Datadog, New Relic, Grafana, Prometheus, OpenTelemetry, Performance Monitoring, Site Reliability Engineering, SLO's, SLA's, APM, Splunk, Observability, Jaeger, Signoz, OpenAPM, Inspectit. 📞 Invite me to your job and we can book a complimentary 30-minute consultation together that’s earnestly helpful. 📞 As an observability expert with a wealth of experience garnered from my tenure at Dynatrace, I bring forth a comprehensive understanding of performance monitoring and troubleshooting. With a proven track record of aiding over 30 customers in resolving intricate performance issues, I excel in providing tailored solutions to meet diverse business needs. Key Skills and Expertise: 1. Performance Optimization: Leveraging a deep-rooted understanding of performance metrics and indicators, I specialize in enhancing system efficiency and reliability. Through meticulous analysis and optimization techniques, I ensure that systems operate at peak performance levels. 2. Observability Platforms: Proficient in utilizing leading observability platforms such as Dynatrace, New Relic, Datadog, and Prometheus, I offer unparalleled expertise in setting up, configuring, and extracting actionable insights from these tools. 3. Dashboard Design and Alerts Configuration: Crafting intuitive dashboards and configuring proactive alerts is at the core of my services. I ensure that stakeholders have real-time visibility into critical metrics, enabling swift responses to potential issues. 4. Tools Consolidation: Recognizing the importance of streamlined operations, I specialize in consolidating monitoring tools to optimize costs without compromising on functionality. By assessing your specific requirements, I devise tailored strategies for tool consolidation at the best pricing. 5. Splunk Integration: Proficient in Splunk integration, I enable seamless data aggregation and analysis, empowering organizations to derive meaningful insights from vast datasets. 6. Site Reliability Engineering (SRE): Embracing SRE principles, I focus on building resilient systems that prioritize reliability and scalability. Through proactive monitoring and automation, I mitigate risks and ensure uninterrupted service delivery. 7. Other: Distributed tracing, Monitoring, ELK APM. Why Choose Me: - Proven Track Record: With a history of successfully resolving performance issues for a diverse clientele, I bring a wealth of practical experience to every project. - Expertise in Leading Tools: Whether it's Grafana, Dynatrace, or Datadog, I possess comprehensive expertise in utilizing leading observability platforms to drive actionable insights and optimizations. - Cost-Effective Solutions: I understand the importance of cost optimization without compromising on functionality. My approach to tools consolidation ensures maximum ROI for your monitoring investments. - Commitment to Excellence: I am dedicated to delivering exceptional results, collaborating closely with clients to understand their unique challenges and requirements. If you're seeking a seasoned observability expert with a proven track record of optimizing performance and driving efficiency, I'm here to help. Let's work together to elevate your systems to new heights of reliability and performance. Let's chat!

Sanket S.

DevOps Architect | AWS, Azure, Kubernetes, Terraform, CI/CD Automation

India

$30/hr

90% Job Success

Available now

Offers consultations

I help startups and SaaS teams build secure, scalable cloud infrastructure on AWS and Azure. With 10+ years of experience in DevOps, DevSecOps, SRE, and cloud automation, I help teams improve deployment speed, strengthen security, and keep production systems reliable. I specialize in: AWS and Azure infrastructure. Kubernetes and container platforms. Terraform and Infrastructure as Code. CI/CD automation. Cloud security hardening. Monitoring, observability, and alerting. Production migrations and cloud modernization. Linux administration and automation. High availability and disaster recovery. Cloud cost optimization. I’ve worked on: AI/ML and data-intensive workloads. Healthcare and compliance-aware environments. Enterprise Kubernetes platforms. Multi-cloud infrastructure. Secure production systems. Automated deployment ecosystems. My approach is practical, automation-first, and reliability-focused. I work well in existing production environments, solve infrastructure bottlenecks quickly, and build maintainable cloud platforms that can scale with business needs. Certifications: AWS Certified Solutions Architect – Associate AWS Certified SysOps Administrator – Associate Certified Kubernetes Administrator (CKA) If you need someone who can own cloud infrastructure, DevOps automation, security hardening, and production reliability end to end, I’d be glad to help.