Operating @ Accenture
> devops_engineer

Reliable systems. At scale. On purpose.

I've spent five years keeping production systems calm — across cloud, containers, and observability. My focus is the unglamorous work that makes everything else possible: automating the toil, owning the incident, and shipping the runbook.

signal · last 24h healthy
> about

What I optimize for.

Four pillars I keep returning to — the lens I use to evaluate a system, a process, or my own work.

01

Reliability at scale

10,000+ servers and 400,000+ cloud assets across AWS, Azure, GCP and Alibaba — kept observable, kept honest.

02

Automation & efficiency

Terraform, GitLab CI, Helm, Chef. If a human is doing it twice, I'd rather write the script before the third time never gets done.

03

Observability & response

Operated five enterprise platforms (Datadog, Dynatrace, Zabbix, Site24x7, PagerDuty). Escalation point for critical incidents, with RCA ownership.

04

Operational excellence

I work the seam between engineering and ops — translating customer pain into platform improvements, and process changes that actually stick.

> stack

Tools I reach for.

The platforms, languages and frameworks I've shipped with in production.

Cloud

4
  • AWS
  • Azure
  • GCP
  • Alibaba Cloud

IaC & Config

3
  • Terraform
  • Chef
  • Ansible

Containers

4
  • Docker
  • Kubernetes
  • Helm
  • Helmfile

CI/CD

2
  • GitLab CI
  • Jenkins

AWS Serverless

7
  • Lambda
  • API Gateway
  • S3
  • SQS
  • SNS
  • EventBridge
  • CloudWatch

Messaging & Cache

2
  • RabbitMQ
  • Redis

Observability

10
  • Datadog
  • Dynatrace
  • Zabbix
  • Site24x7
  • PagerDuty
  • New Relic
  • Prometheus
  • Grafana
  • Kibana
  • Graylog

Languages

3
  • Python
  • Bash
  • PowerShell

Data & Identity

4
  • PostgreSQL
  • Okta
  • Keycloak
  • Active Directory

ITSM

1
  • ServiceNow
> experience

Where I've operated.

From Tier 1 NOC support to designing AWS infrastructure for global enterprise platforms — the path so far.

Nov 2021 — Present current

Accenture

DevOps Engineer

  • Cloud Infra Designed and managed AWS infrastructure with Terraform, fully integrated with GitLab pipelines.
  • Serverless Built solutions on S3, Lambda, API Gateway, SQS, SNS, CloudWatch and EventBridge.
  • Event-driven Operated containerized FastAPI services with RabbitMQ messaging and Redis caching.
  • Orchestration Deployed and managed apps via Helm on Kubernetes clusters.
  • CI/CD Designed and optimized GitLab pipelines spanning AWS, Kubernetes and on-premises.
  • Config Mgmt Developed and maintained Chef cookbooks for automated configuration.
  • Scripting Built Python, Bash and PowerShell automation for operational tasks.
  • Incidents Escalation point for critical incidents — drove resolution within SLA and owned RCA.
  • Observability Administered Datadog, Zabbix, Dynatrace, Site24x7 and PagerDuty at enterprise scale.
  • Database Investigated and resolved PostgreSQL performance and availability issues.
  • Scale Deployed monitoring across 10,000+ servers and 400,000+ cloud assets (AWS, Azure, GCP, Alibaba).
  • Workflows Automated ServiceNow ticket handling — reducing manual effort and improving response time.
  • Mentoring Trained new team members and promoted DevOps best practices across teams.
Jul 2020 — Nov 2021

FPT Software

DevOps Engineer

  • Cloud Infra Deployed Azure VMs with Terraform.
  • Deploy & Config Worked with Ansible, Kubernetes, Helm, Helmfile, Jenkins and Docker.
  • Identity Operated VMware Horizon, Okta, Keycloak and Active Directory.
  • Policy Managed and modified GPOs to meet customer requirements.
  • Observability Monitored apps with New Relic APM and Synthetics for incident detection.
  • RCA Troubleshot with stakeholders and authored post-incident root-cause analyses.
  • Customer Daily issue reports and requirements gathering with customers.
  • Collaboration Partnered with R&D, MML and CloudOps teams on dashboards and infra issues.
Nov 2019 — May 2020

CXA Group

NOC Engineer

  • Tier 1 First-line application and infrastructure support across the enterprise.
  • Triage Monitored alerts and dashboards; escalated incidents to the right stakeholders.
  • Tooling Used Prometheus, Grafana, Dynatrace, Azure App Insights, K8s dashboards and Kibana to detect issues.
  • Improvements Partnered with DevOps teams to evolve monitoring dashboards.
Oct 2017 — Nov 2019

IPTP Networks

Network / System Engineer

  • Tier 1 Identified, diagnosed and troubleshot issues for international customers.
  • On-call Escalation and 24×7 on-call rotations across the global network.
  • Monitoring Operated Cacti, Smokeping, Nfsen and Graylog to monitor global infrastructure.
  • Hands-on Network equipment configuration; remote-hands coordination with international datacenters.
  • Inventory Hardware inventory and technical documentation handling.
  • Mentoring Supervised the NOC/NCC team in Vietnam.
> education

Foundations.

Posts and Telecommunications Institute of Technology

Ho Chi Minh City

degree
Engineer of Telecommunications
> contact

Let's talk.

Hiring, collaborating, mentoring, or just curious about an observability stack — I read everything.

Or reach me directly: nguyenthanhtai2702@gmail.com