Thoughtworks Logo

Thoughtworks

Lead AI Infrastructure Engineer

Posted 4 Days Ago
Be an Early Applicant
In-Office
Singapore
Senior level
In-Office
Singapore
Senior level
Lead AI Infrastructure Engineer to design and maintain high-performance, scalable infrastructure for AI workloads, enabling advanced inference systems across cloud and on-premises environments.
The summary above was generated by AI

Thoughtworks Singapore will be shortlisting applicants who have a current right to work in Singapore i.e. Singapore Citizens and Singapore Permanent Residents only.

At Thoughtworks, Lead AI Infrastructure Engineers design and maintain high-performance, scalable, and resilient infrastructure for modern AI workloads. You’ll focus on enabling advanced inference systems, including LLMs, VLMs, and SLMs, across on-premises GPU clusters and cloud environments. This role is critical to ensuring our clients’ AI systems achieve demanding requirements for throughput, latency, availability, and compliance.

As a senior technical leader, you will partner with ML engineers, platform engineers, AI researchers, and client stakeholders to deliver optimized infrastructure that is both robust and future-proof. You will combine deep expertise in GPU-based inference infrastructure with a broader understanding of DevOps, agile delivery, and platform engineering to drive impactful AI solutions at enterprise scale.

Job responsibilities
  • Design and operate GPU-based infrastructure (e.g., NVIDIA GB200, H100) across cloud and self-hosted environments.
  • Architect scalable inference platforms that support real-time and batch serving with high availability, load balancing, and fault tolerance.
  • Integrate inference workloads with orchestration frameworks such as Kubernetes, Slurm, and Ray, as well as observability stacks like Prometheus, Grafana, and OpenTelemetry.
  • Automate infrastructure provisioning and deployment using Terraform, Helm, and CI/CD pipelines.
  • Collaborate with ML engineers to co-design systems optimized for low-latency serving, continuous batching, and advanced inference optimization techniques (quantization, distillation, pruning, KV caching).
  • Lead client engagements by shaping technical roadmaps that align AI infrastructure with business objectives, ensuring compliance, scalability, and performance.
  • Champion DevOps and agile practices to accelerate delivery while maintaining reliability, quality, and resilience.
  • Mentor and guide teams in best practices for AI infrastructure engineering, fostering a culture of technical excellence and innovation.
Job qualifications
Technical Skills
  • Expertise in GPU-based infrastructure for AI (H100, GB200, or similar), including scaling across clusters.
  • Strong knowledge of orchestration frameworks: Kubernetes, Ray, Slurm.
  • Experience with inference-serving frameworks (vLLM, NVIDIA Triton, DeepSpeed).
  • Proficiency in infrastructure automation (Terraform, Helm, CI/CD pipelines).
  • Experience building resilient, high-throughput, low-latency systems for AI inference.Strong background in observability and monitoring: Prometheus, Grafana, OpenTelemetry.
  • Familiarity with security, compliance, and governance concerns in AI infrastructure (data sovereignty, air-gapped deployments, audit logging).
  • Solid understanding of DevOps, cloud-native architectures, and Infrastructure as Code.
  • Exposure to multi-cloud and hybrid deployments (AWS, GCP, Azure, sovereign/private cloud).
  • Experience with benchmarking and cost/performance tuning for AI systems.
  • Background in MLOps or collaboration with ML teams on large-scale AI production systems.
Professional Skills
  • Proven ability to partner with senior client stakeholders (CTO, CIO, COO) and translate technical strategy into business outcomes.
  • Skilled at leading multi-disciplinary teams and building trust across diverse technical and business functions.
  • Strong communication skills, with the ability to explain complex AI infrastructure concepts to both technical and non-technical audiences.
  • Comfortable navigating uncertainty, making pragmatic decisions, and adapting quickly to evolving technologies.
  • Passionate about creating scalable, sustainable, and high-impact solutions that help transform industries with AI.
Other things to know
Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

#LI-Onsite

See here our AI policy.

Top Skills

AWS
Azure
Deepspeed
GCP
Grafana
Helm
Kubernetes
Nvidia Gb200
Nvidia H100
Nvidia Triton
Opentelemetry
Prometheus
Ray
Slurm
Terraform
Vllm

Similar Jobs

57 Minutes Ago
In-Office
Singapore, SGP
Mid level
Mid level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
Lead the product management team for Airwallex's money movement infrastructure, overseeing strategy, development, and collaboration across teams.
Top Skills: APIsFinancial InfrastructureWeb Applications
58 Minutes Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
Lead the development and implementation of anti-fraud capabilities for payment products, overseeing product lifecycle and team management.
Top Skills: Fraud Detection SolutionsPayments FraudPayments Risk ManagementRisk Analytics
5 Hours Ago
In-Office
Singapore, SGP
Expert/Leader
Expert/Leader
Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
The Director of Investor Services will lead AML/CFT and Investor Services teams, ensuring compliance, managing risks, mentoring staff, and developing client relationships.
Top Skills: AmlCftInvestor ServicesTransfer Agency

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account