Talent badge filter
Skills filter
Select talent location
Select talent time zones
$20/hr
100%
Job Success
$20K+ earned
Offers consultations
Start of list.
End of list.
I'm a Senior DevOps & Cloud Infrastructure Engineer with 6+ years of hands-on
experience building, securing, and automating cloud infrastructure on AWS, GCP,
and Azure.
I specialize in:
✅ AWS infrastructure (ECS Fargate, EKS, RDS, S3, Lambda, Bedrock, Cognito,
API Gateway, EventBridge, SQS, CloudTrail, CloudWatch, IAM, Secrets Manager,
CodeDeploy, Control Tower)
✅ DevSecOps & Security (AWS WAF, Network Firewall, VPC architecture, SIEM/Wazuh,
CJIS compliance, Site-to-Site VPN)
✅ Kubernetes & Containers (EKS, ECS, Helm, Docker, Argonaut)
✅ Infrastructure as Code (Terraform, AWS CDK)
✅ CI/CD Pipelines (GitHub Actions, CodePipeline, Bitbucket Pipelines)
✅ Monitoring & Observability (Prometheus, Grafana, Alertmanager, CloudWatch)
✅ Self-hosted PaaS (Coolify, Dokploy, Caprover)
✅ Shopify App Deployment (OAuth, CSP headers, webhooks, reverse proxy)
✅ GenAI Infrastructure (AWS Bedrock, RAG pipelines, Terraform modules for
Bedrock Agents)
Recent highlights:
🔹 Redesigned AWS VPC architecture with centralized egress/ingress via Network
Firewall — resolved critical production security audit findings
🔹 Built production-ready Terraform module for AWS Bedrock GenAI infrastructure
(RAG stack) used across Dev/Test/Prod environments
🔹 Designed HA Kubernetes clusters on Scaleway with Cloudflare failover,
WireGuard VPN DR, and automated Helm/Terraform CI/CD
🔹 Deployed complete monitoring stack (Prometheus + Grafana + Alertmanager)
with smart alerting for production workloads
I work with startups and enterprises across SaaS, healthcare, fintech, and
e-commerce. Whether it's a greenfield cloud setup, a security hardening project,
or a CI/CD pipeline overhaul — I deliver clean, documented, production-ready work.
Top Rated Plus on Upwork | 96% JSS | $21K+ earned
Associated with
Delivora
$40/hr
$40K+ earned
Start of list.
End of list.
SRE with a decade-long experience in building reliable, scalable and
secure infrastructure on the cloud.
$25/hr
85%
Job Success
$900+ earned
Start of list.
End of list.
Hello world!
I'm a dedicated IT professional with 8 years of experience, I specialize in DevOps, Site Reliability and Automation Engineering with a versatile skill set that spans both Linux and Windows environments.
My journey has included diverse roles, from corporate positions to freelancing, which has enriched my expertise and broadened my perspective on various IT challenges.
As a DevOps, SRE and Automation enthusiast, continuously exploring new technologies and methodologies to drive innovation and optimize performance.
Let's connect and explore how we can leverage technology to achieve exceptional results!
Malintha S.
has worked
.
$100/hr
100%
Job Success
Available now
Start of list.
End of list.
US-Based. Ex-Google Senior Systems Engineer. I have over 25 years of running Linux infrastructure, websites, system administration and technical services.
I can help with:
* Linux server troubleshooting, setup & security (CentOS, Ubuntu, Alma, Rocky, etc.)
* Apache & nginx web servers
* MySQL & MariaDB SQL servers
* Email services (dovecot, exim, sendmail)
* PHP, Golang, Python programming, Bash scripting
* SSL certificate issues
* DNS issues
* Wordpress issues
* Server & website migrations.
* VPI/PBX: Asterisk, Kamailio, Twilio, etc.
* Compiling software
* Web hosting control panels (including cPanel/WHM, etc)
* GCP & AWS management
* Networking: VPNs, VLANs, routing, switching, DHCP, PXE, BGP, etc
* SRE, Devlops, monitoring & logging, etc.
After running a web hosting company for 17 years and then working as a systems engineer at Google for 7 years I am now ready to help you with your internet & website infrastructure.
I live in the US, work in the Eastern Time Zone and I am available via email, WhatsApp, phone or video calls. If you need a trusted partner to work with your website or infrastructure, then look no further :)
Avi B.
has worked
.
$50/hr
Available now
Start of list.
End of list.
I help teams run production Kubernetes on AWS, GCP, and multi-cloud environments reliably, observably, and without runaway cloud bills. As an SRE and Cloud Platform Engineer at a Silicon Valley DaaS company, I operate cloud-native platforms that provision more than 1000 cloud desktop containers in under 60 seconds and helped reduce cloud costs by 33% through capacity planning and Kubernetes optimization.
My core work includes Kubernetes on AWS EKS and GCP GKE, Terraform and Terragrunt, Docker, CI/CD, GitOps with GitHub Actions and ArgoCD, and observability with Prometheus, Grafana, Loki, and related tooling. I improve MTTR through alerting, incident response, and post-mortems. I also build Linux desktop container workloads for VDI and DaaS use cases.
Selected outcomes:
Gomydesk: Production Kubernetes on EKS and GKE, AWS and GCP infrastructure, GitOps CI/CD, observability, incident management, more than 1000 cloud desktops in under 60 seconds, and 33% cloud cost reduction.
Serveden: Kubernetes and GKE for AI and LLM workloads, Prometheus, Grafana, Loki, Tempo, cluster automation, stability improvements, and cost optimization.
Takiacademy: Backend and platform work for Messaggera platform with 50K+ users and TakiAcademy with 400K+ users, real-time systems with Socket IO, CI/CD, and production reliability.
I am CKAD certified, a Kubeflow contributor, and have trained 80+ engineers on Docker, Kubernetes, ArgoCD, and Bash.
Ideal projects: Kubernetes on EKS or GKE, cloud infrastructure audits, Terraform IaC, GitOps, observability stacks, cost optimization, incident response, and SRE retainers.
Message me your cloud provider, cluster size, and top infrastructure pain point. I will reply with a concrete first step.
$95/hr
72%
Job Success
$1K+ earned
Available now
Start of list.
End of list.
I help startups, SaaS teams, and modern businesses build reliable cloud systems, automate operations, and deploy AI infrastructure that actually works in production — not just in demos.I am a Senior Site Reliability and DevOps Engineer with 10+ years of experience across AWS, Azure, and Kubernetes environments. As the founder of BotBrewLabs, a Fort Collins-based cloud and AI consulting firm, I specialize in bridging the gap between traditional cloud operations and the practical realities of running AI workloads at scale.What I deliver for clients:— Cloud Infrastructure & DevOps: AWS and Azure architecture, Terraform IaC, Kubernetes (EKS/AKS), and CI/CD pipelines (GitHub Actions, Jenkins) that reduce release cycles and improve deployment reliability.— AI Infrastructure & MLOps: Production-ready infrastructure for LLM workloads (Llama, Gemini, Claude, Cohere), model serving pipelines, RAG systems, and AIOps-driven observability using Prometheus, Grafana, and OpenSearch.— Site Reliability Engineering: SLO/SLI definition, error budget management, incident response, runbook development, and on-call optimization to maintain high availability at scale.— Security, Compliance & FinOps: FedRAMP, SOC 2, and NIST 800-53 compliance readiness, IAM hardening, cloud cost optimization, and security posture reviews that reduce risk and waste.Proven results:
•Reduced release times by 45% through automated Terraform + Helm deployments
•Cut MTTR by 35% with integrated Azure Monitor and Prometheus/Grafana observability
•Reduced operational cloud spend by 20% through right-sizing and auto-scaling strategies
•Achieved FedRAMP-ready compliance posture for federal and healthcare SaaS platforms
My background spans highly regulated environments — from healthcare SaaS (Keet Health/WebPT) and federal agencies (USDA via NetImpact Strategies) to financial institutions (DTCC) — where reliability, security, and compliance are non-negotiable.Through BotBrewLabs, I also offer structured AI Readiness Assessments for organizations that want to adopt AI responsibly: reviewing leadership alignment, governance, infrastructure readiness, and risk exposure before scaling AI use across teams.If you need a senior technical partner who delivers measurable outcomes — not just hours — let's connect.
$40/hr
100%
Job Success
$10K+ earned
Start of list.
End of list.
CKS │ CKA │ 𝐓𝐞𝐫𝐫𝐚𝐟𝐨𝐫𝐦 │ 𝐀𝐖𝐒 │ 𝐀𝐳𝐮𝐫𝐞 │ 𝐆𝐂𝐏 │ 𝐈𝐚𝐂 | 𝗗𝗲𝘃𝗢𝗽𝘀 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 | 𝐂𝐥𝐨𝐮𝐝 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 | Network Engineer
Are your cloud costs spiraling? Is your team struggling to maintain application uptime and deployment velocity? I build and manage scalable, resilient, and cost-effective cloud infrastructure that empowers your business to innovate faster and more reliably.
As a senior DevOps and Site Reliability Engineer (SRE) with deep expertise in AWS and Kubernetes, I architect solutions that are not just powerful, but also secure and maintainable. My focus is on eliminating infrastructure bottlenecks, automating complex processes, and ensuring your systems achieve maximum performance and availability.
My Core Services:
Kubernetes & Container Orchestration: I design, deploy, and manage production-grade Kubernetes clusters (including Amazon EKS) from the ground up. My expertise includes cluster security, auto-scaling, service mesh implementation (Istio), and creating streamlined developer workflows with Docker and Helm.
AWS Cloud Architecture & Automation: I build secure, scalable, and fault-tolerant infrastructure on AWS using Infrastructure as Code (IaC). I leverage tools like Terraform and CloudFormation to create repeatable and auditable environments, ensuring consistency across your entire cloud footprint.
CI/CD Pipeline Development: I create robust, end-to-end CI/CD pipelines using tools like Jenkins, GitLab CI, and GitHub Actions. This enables your development teams to release code faster and with greater confidence, reducing manual errors and accelerating time-to-market.
Site Reliability & Observability: I implement comprehensive monitoring, logging, and alerting solutions using Prometheus, Grafana, and CloudWatch. By establishing clear Service Level Objectives (SLOs), I proactively identify and resolve issues before they impact your users, ensuring rock-solid site reliability.
My Technical Skillset Includes:
Cloud Platforms: Amazon Web Services (AWS) - EC2, EKS, S3, RDS, VPC, IAM, Lambda, Route 53
Containerization & Orchestration: Kubernetes (K8s), Docker, Helm, Amazon EKS, ECS
Infrastructure as Code (IaC): Terraform, AWS CloudFormation, Ansible
CI/CD & Automation: Jenkins, GitLab CI, GitHub Actions, Bash, Python
Monitoring & Observability: Prometheus, Grafana, ELK Stack, Datadog, AWS CloudWatch
Whether you are migrating legacy systems to the cloud, optimizing an existing Kubernetes deployment, or looking to instill a true DevOps culture, I have the experience to guide you.
Ready to build infrastructure that works for you, not against you?
Click the "Invite" button to schedule a call. Let's discuss your infrastructure challenges and build a roadmap for success.
$50/hr
100%
Job Success
$700K+ earned
Start of list.
End of list.
Senior Site Reliability Engineer with strong experience operating and scaling large-scale, production systems. Deep expertise in infrastructure automation, CI/CD, and cloud-native platforms, with hands-on work across Ansible, GCP, and Kubernetes.
Proven track record in incident management, including on-call ownership, incident command, root cause analysis, and driving corrective actions to reduce MTTR and improve long-term reliability.
Experienced in defining and enforcing SLIs, SLOs, and error budgets, and using monitoring and alerting to balance reliability with delivery velocity. Strong background in GNU/Linux administration, security hardening, and NodeBalancer performance optimization. Focused on building resilient systems through automation, observability, and disciplined operational practices across cloud and on-prem environments.
Associated with
Maborak Technologies, Inc.
$300K+
earned
$25/hr
100%
Job Success
$10K+ earned
Available now
Start of list.
End of list.
Are you looking for an AI-powered DevOps / SRE engineer who doesn't just maintain infrastructure - but makes it smarter, self-healing, and cost-optimized? You're in the right place.
With 10+ years of hands-on experience in Site Reliability Engineering and DevOps, I have evolved my practice by integrating AI and automation at every layer of the stack. From building MCP (Model Context Protocol) servers that let AI agents manage AWS & Azure resources, to deploying intelligent log monitoring and cost analysis pipelines - I bring a modern, AI-first approach to infrastructure engineering.
What Sets Me Apart: AI-Enhanced DevOps
I don't just use traditional DevOps tools - I augment them with AI to deliver faster results and deeper insights:
MCP Servers for Cloud Automation: I build and deploy Model Context Protocol (MCP) servers that allow AI agents to directly interact with AWS & Azure APIs - automating provisioning, auditing, incident response, and resource management through natural language workflows.
Fine-Grained Cloud Auditing & Deep Cost Analysis: AI-driven scripts that continuously scan your cloud accounts, flag unused resources, detect anomalous spending patterns, and generate actionable cost-optimization reports - saving clients 20-40% on monthly cloud bills.
AI-Powered Python Automation: Custom Python scripts powered by LLMs that replace tedious manual processes - from auto-generating Terraform modules to intelligent config drift detection and remediation.
AI-Powered Log Monitoring & Anomaly Detection: Intelligent observability pipelines that go beyond static thresholds - using AI to detect patterns, predict outages, auto-classify errors, and surface root-cause insights from logs (Loki, ELK, CloudWatch).
Helm & Kubernetes with AI Agents: AI-assisted cluster management using agents that monitor pod health, auto-scale workloads, optimize Helm chart configurations, and provide natural-language troubleshooting for K8s issues.
Core Technical Skills:
1. Cloud Platforms: AWS (Primary), Azure, Google Cloud, Digital Ocean
2. Databases: RDS, DynamoDB, MongoDB, PostgreSQL
3. IaC & CI/CD: Terraform, Ansible, CloudFormation, GitHub Actions, GitLab CI, Azure DevOps, Jenkins
4. Containers: Docker, ECS Fargate, EKS, Kubernetes, Helm
5. Observability: Grafana, Prometheus, OpenTelemetry, CloudWatch, X-Ray, Loki, ELK, New Relic
6. AI Tooling: MCP Servers, AI-powered Python scripts, LLM-integrated workflows, AI agents for K8s
7. Web/Infra: Nginx, Apache, ALB/NLB, Proxmox, KVM
Certifications:
AWS Certified Solutions Architect – Associate
AWS Certified SysOps Administrator – Associate
When you work with me, you get:
Fast responses - I value your time Clear communication - structured milestones AI-first mindset - faster, smarter, more cost-effective solutions
If any of the above sounds like what you need, let's connect!
Associated with
VT Netzwelt Private Limited
$1M+
earned
Bosnia and Herzegovina
$50/hr
100%
Job Success
$100K+ earned
Start of list.
End of list.
Senior DevOps & Site Reliability Engineer with 7+ years building resilient, scalable infrastructure for software and AI/ML workloads.
I specialize in:
- CI/CD pipeline design and automation (Jenkins, GitHub Actions, GitLab CI)
- Kubernetes orchestration and containerized deployments (Docker, EKS, OpenShift)
- Cloud infrastructure on AWS, GCP, and Azure
- AI/ML infrastructure: GPU provisioning, model serving pipelines, MLOps tooling
- SRE practices: monitoring with Prometheus/Grafana, incident response, SLO/SLI management
- Infrastructure as Code (Terraform, Ansible, CloudFormation)
With $100K+ earned and 100% Job Success on Upwork, I help teams ship faster, reduce downtime, and build the infrastructure foundation that AI-driven products need. Whether you're deploying ML models to production or scaling your platform, I'll automate and harden the path from code to customer.