Talent badge filter
Skills filter
Select talent location
Select talent time zones
Vietnam
$12/hr
100%
Job Success
$500+ earned
Start of list.
End of list.
I'm an SRE at Asia Commercial Bank Vietnam. Ensure and maintain 24/7 service and ensure 99.9% uptime of servers, perform backup, CVE patching of all servers on managed applications.
In addition to:
- Deploying applications to re-production and production environment. Building a monitoring program.
- Monitoring the roll-out of new software applications to ensure there are no problems.
Troubleshooting and resolving any problems with application software.
$65/hr
$500+ earned
Start of list.
End of list.
Azure | AWS | GCP | Terraform | CI/CD Architecture | SRE | DevOps
I am a Principal-level DevOps and Site Reliability Engineer with nearly two decades of experience designing, operating, and scaling mission-critical cloud platforms. I specialize in multi-region Azure architecture, distributed systems, CI/CD modernization, performance engineering, and reliability automation.
Core Competencies
Multi-Region Cloud Architecture (Azure, AWS, GCP)
Design and operation of highly available, compliant platforms (PROD, DEV, QA1, QA2, Sandbox) supporting Web Apps, Function Apps, and data pipelines.
Site Reliability Engineering (SRE)
Incident reduction, automated failover and recovery, health modeling, and elimination of recurring failures.
Performance Engineering
Diagnosis of CPU/memory saturation, cold starts, auto-heal loops, unstable restarts, and latency improvements via App Service Plan tuning.
Observability & Monitoring
Azure Monitor, Application Insights, KQL dashboards, Logstash, Grafana — complete visibility for applications and infrastructure.
Security & Secrets Automation
Secret rotation with Key Vault + Python + Terraform, RBAC hardening, identity flow governance, and compliance across environments.
CI/CD & Infrastructure as Code
Terraform (AzureRM), GitHub Actions pipelines, automated DNS/SSL workflows, and secure reproducible IaC for enterprise workloads.
Why Clients Work With Me
- Architect-level decision-making with deep implementation skills
- Proven ability to stabilize and optimize high-scale systems
- Strong diagnostic capabilities for critical production issues
- Modernization of CI/CD, observability, security, and IaC at scale
- Clear communication and high-quality documentation
- Fluent in English, Spanish, and Portuguese
Selected Experience
Senior DevOps / SRE Consultant — Freelance
Architect for multi-region Azure platforms; implemented SRE practices, automated failover, observability stacks, secret automation, RBAC hardening, and Terraform IaC.
DevOps Engineer — Globant (BBVA Global Team)
CI/CD modernization for 40+ enterprise apps; SonarQube rollout to ~600 developers; compliance improvements; EKS modernization; Jenkins Groovy pipelines.
Cloud DevOps Engineer — Santander (OnLab)
Azure VNets, VPNs, Load Balancers, Bastion, AD/AAD, B2C identity, AKS autoscaling, GitHub Actions/Azure DevOps pipelines, ArgoCD GitOps, observability, Terraform automation.
QA DevOps Engineer — Web.com
Kubernetes operations, CI/CD pipelines, test automation, and observability with Prometheus/Grafana.
Unix/Linux Engineering — IBM, Cetelem, RSA, DC Solutions
Solaris/AIX/HP-UX, clustering, virtualization, D-Trace debugging, SAN/NAS, CIS hardening, Oracle/JBoss hosting, production troubleshooting.
$20/hr
80%
Job Success
$9K+ earned
Start of list.
End of list.
6+ Years Engineering High-Availability Cloud Infrastructure, Proactive Observability, and Bulletproof CI/CD Pipelines.
In production, downtime isn’t just an inconvenience, it’s a revenue killer. As a Senior SRE and DevOps Engineer, I bridge the gap between development and operations to ensure your applications are highly scalable, resilient, and transparent. My core philosophy relies on proactive reliability: designing system observability, synthetic monitoring, and robust SLO/SLI frameworks to catch and resolve anomalies before they ever impact your end users.
Whether you need to containerize legacy workloads, automate infrastructure from scratch via Code (IaC), or build comprehensive, multi-layered dashboards for enterprise-level visibility, I engineer stable environments that empower development teams to deploy with confidence.
Core Areas of Expertise:
Observability & Monitoring: Deep experience designing end-to-end telemetry across enterprise stacks. Proficient in Datadog, Dynatrace, ELK, Grafana, Prometheus, AppDynamics, Nagios, and Kibana. Specialized in alert tuning, synthetic monitoring, and SLO/SLI mapping.
Cloud & Container Orchestration: Advanced provisioning and management across AWS and Azure. Expert-level containerization with Docker and Kubernetes (EKS/AKS cluster management, scaling, and security).
Infrastructure as Code (IaC) & Configuration: Building modular, reusable infrastructure using Terraform, alongside automated configuration management with Ansible and Chef.
CI/CD & Automation: Architecting reliable continuous integration and continuous deployment pipelines that accelerate release cycles while maintaining strict quality gates.
Operations & Support: Extensive background in L3 Application Support, web development, hands-on incident response, and executing robust disaster recovery strategies.
Why Work With Me?
Enterprise-Ready Communication: I translate complex technical infrastructure into high-impact, accessible insights for stakeholders and business owners.
Self-Driven Execution: Comfortable operating autonomously, managing workflows, writing rigorous technical documentation/runbooks, and collaborating across timezone-agnostic teams.
Culture of Collaboration: I do not just fix servers; I foster a true DevOps culture of shared responsibility, automation, and continuous improvement.
Let’s ensure your infrastructure is built to scale. Click "Invite to Job" or "Message" to discuss how we can optimize your environment today.
$35/hr
100%
Job Success
$2K+ earned
Offers consultations
Start of list.
End of list.
Results-driven DevOps/SRE Engineer with a passion for cloud technologies, automation, and building high-availability systems. Proven expertise in optimizing infrastructures and driving operational excellence.
$40/hr
$0 earned
Start of list.
End of list.
Hi! I’m Anatoliy — Senior certified DevOps / SRE Engineer with 8+ years of hands-on experience building and operating production-grade cloud infrastructure, observability platforms, and CI/CD systems.
I specialize in designing reliable, scalable, and fully automated DevOps ecosystems in AWS/EKS environments, with a strong focus on observability and incident resilience.
What I do best
Production-grade AWS & Kubernetes (EKS)
Design and operate highly available systems in AWS / GCP (EC2, EKS, ASG, Route53, ACM, S3)
Kubernetes: multi-environment setups, ingress, canary & blue/green deployments
Cluster architecture, networking, and failure recovery scenarios
Observability & Monitoring (from scratch)
Build end-to-end monitoring systems: metrics, logs, tracing
Stack: Prometheus, Grafana, Mimir / Thanos / VictoriaMetrics, ELK
Alerting strategies: Alertmanager routing, SLOs, error budgets
Log-to-metric pipelines and multi-source data integration
CI/CD & Automation
Design fully automated pipelines (Jenkins, GitLab CI, GitHub Actions)
Docker build pipelines, release strategies (canary, rollback, risk-based deploys)
Infrastructure as Code: Terraform / Ansible
Strong scripting: Bash, Python, PowerShell, Groovy
SRE & Incident Management
Root cause analysis of complex production issues
System reliability improvements and performance tuning
Experience with real-world failures, not just greenfield setups
AI-assisted DevOps
Daily use of tools like ChatGPT, Claude
Automating configs, pipelines, and debugging workflows with AI
Key strengths
Building observability platforms from zero to production
Designing resilient architectures under failure conditions
Deep hands-on with Prometheus ecosystem & time-series databases
Strong focus on automation, reliability, and cost efficiency (FinOps mindset)
$50/hr
$0 earned
Start of list.
End of list.
🚀 DevOps & Site Reliability Engineer | Automating Cloud Infrastructure & Scaling Solutions
Hi, I'm Mahesh — a passionate DevOps & SRE Engineer with hands-on experience delivering end-to-end infrastructure automation, CI/CD pipelines, cloud-native deployments, and application monitoring. With a strong foundation in AWS and Kubernetes, I help companies achieve faster, more secure, and reliable deployments.
✅ What I Can Help You With:
- Cloud Infrastructure: AWS, GCP, Azure – scalable & secure architecture design
- IaC (Infrastructure as Code): Terraform, CloudFormation, Ansible
- Containerization & Orchestration: Docker, Kubernetes (EKS, GKE, AKS), Helm
- CI/CD Pipelines: Jenkins, GitHub Actions, GitLab CI/CD, AWS CodePipeline
- Monitoring & Observability: Prometheus, Grafana, ELK Stack, AWS CloudWatch
- Security & Scanning: Trivy, SonarQube, OWASP Dependency-Check
- GitOps & Automation: ArgoCD, Flux, AWS Lambda, Bash/Python scripting
- System Hardening & Performance Optimization
## Technology and tools i worked with
🔹 AWS | GCP | Azure
🔹 Docker | Kubernetes | Helm
🔹 Terraform | Ansible | Packer
🔹 Jenkins | GitHub Actions | GitLab CI
🔹 ArgoCD | FluxCD
🔹 Prometheus | Grafana | Loki | ELK Stack
🔹 Trivy | SonarQube | OWASP ZAP
🔹 Bash | Python | Shell scripting
🔹 Linux | NGINX | Apache | HAProxy
$20/hr
$15 earned
Start of list.
End of list.
I'm Seun — a Senior DevOps Engineer with 6+ years of experience building secure, scalable infrastructure for fast-growing teams and regulated industries.
I specialize in:
Kubernetes, Terraform, and cloud platforms (Azure, AWS)
CI/CD pipeline automation using GitHub Actions, Azure Devops and CircleCI
Secure deployments with PCI DSS-compliant architecture
Monitoring & alerting using Prometheus, Grafana, Datadog
Seamless collaboration with developers and product teams
I've helped banks and startups build and modernize platforms — from legacy system migrations to cloud-native microservices — while maintaining speed, security, and reliability.
If you're looking for an engineer who can automate your delivery process, harden your infrastructure, and improve system uptime, let's talk.
$60/hr
100%
Job Success
$400K+ earned
Available now
Offers consultations
Start of list.
End of list.
Every hour of downtime costs your enterprise an average of $540,000. That's not a DevOps metric - that's a board-level problem.
I design and operate infrastructure that stays up. My specialty is the full observability and reliability stack: detecting failures before users do, recovering in minutes, not hours, and building the self-healing systems that let your engineering team sleep at night.
With $400K+ earned on Upwork, 100% Job Success Score, and 10+ years in DevOps and Site Reliability Engineering, I bring the broadest monitoring stack you'll find in one engineer: Prometheus · Grafana · Datadog · ELK Stack · Zabbix · Sumo Logic - combined with deep Kubernetes and cloud expertise.
Typical Problems I Solve
→ Production incidents are frequent, and MTTR is too high
→ No observability - issues discovered by users, not monitoring
→ Need DevOps engineer for high availability infrastructure setup
→ Kubernetes clusters experiencing performance or reliability issues
→ Disaster recovery plan missing or untested
→ AI workloads need reliable, observable infrastructure
Proven Client Results
▪️ iLost Platform (Netherlands)
→ Challenge: EKS performance degrading, CI/CD unstable across Dev/Staging/Production, and OpenSearch causing timeouts.
→ Solution: EKS rightsizing, CI/CD pipeline optimization, IAM hardening, OpenSearch performance tuning.
→ Business Impact: Reduced bottlenecks, improved release predictability, and optimized infrastructure costs.
▪️ Qopla Infrastructure
→ Challenge: Long-term SRE engagement - ongoing reliability, monitoring, and performance needs.
→ Solution: Continuous SRE operations, incident management, and infrastructure optimization.
→ Business Impact: 657-hour engagement demonstrating sustained impact. Rate progressed $40/hr → $52/hr based on results.
▪️ GetChecked (Blockchain / Multi-cloud)
→ Challenge: Multi-cloud infrastructure (AWS + DigitalOcean + bare-metal) with Hyperledger Fabric in a regulated environment.
→ Solution: Upgraded Hyperledger Fabric, implemented monitoring, backup, DR/BCP, optimized CI/CD pipelines.
→ Business Impact: Increased system reliability and security in a strict regulatory context.
Core SRE Capabilities
→ DORA metrics: Deployment Frequency, Lead Time, MTTR, Change Failure Rate - measured and improved
→ Incident Management & On-Call: Runbooks, escalation paths, post-mortems
→ Disaster Recovery & Business Continuity planning and testing
→ Zero Trust Security: IAM, RBAC, SELinux, network hardening
→ High Availability: Multi-AZ, failover, load balancing, autoscaling
→ AI workload reliability: observability for ML pipelines and LLM infrastructure
Stack: AWS · GCP · Kubernetes · Terraform · Ansible · Prometheus · Grafana · ELK Stack · Datadog · Zabbix · Docker · Linux · GitHub Actions · GitLab CI · Incident Response Plan · Disaster Recovery
Available during US business hours (EST/PST overlap).
Downtime is a solved problem - if you have the right observability. Send an invite and let's audit your reliability posture.
Dmitriy M.
has worked
.
Associated with
Spaceport
$1M+
earned
$80/hr
$200K+ earned
Offers consultations
Start of list.
End of list.
High-skilled Software Engineer with 15 years of experience. My unique set of
skills has taken companies to their goals over these many years. I professionally
work toward that every day.
Yasmany C.
has worked
.
$9/hr
$0 earned
Start of list.
End of list.
DevOps Engineer and Software Engineer specialized in CI/CD pipelines and Kubernetes automation. Experienced in container orchestration, observability, and infrastructure reliability using Prometheus, Grafana, Jaeger, and Elasticsearch. Skilled in designing and maintaining scalable backend systems, implementing distributed tracing, and automating deployments across environments.
As an aspiring Site Reliability Engineer (SRE), I focus on creating robust, monitored, and self-healing infrastructures that ensure consistent uptime and smooth delivery workflows. Passionate about automation, system performance, and building production-grade environments that balance speed and reliability.