Thoughtworks Logo

Thoughtworks

Senior MLOps Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
Singapore
Senior level
In-Office
Singapore
Senior level
As a Senior MLOps Engineer, you will ensure the reliability of machine learning systems, design monitoring strategies, resolve production issues, and manage the lifecycle of ML models while promoting a collaborative team culture.
The summary above was generated by AI

Due to the project requirement, candidates must be Singaporean citizens or already hold Singaporean Permanent Residency (PR) at the time of application.

As an MLOps Engineer in DAMO service line, you will be responsible for ensuring the reliability, safety, performance and continuous improvement of large-scale machine learning and AI systems in production, including both generative AI and traditional ML systems like computer vision and recommendation models. You will work across the full software delivery lifecycle, contributing to design, implementation, deployment and ongoing operational excellence.

You will champion engineering best practices, including clean and maintainable code, test-driven development, continuous delivery, strong observability and collaborative development through pairing and code reviews. You will stay hands-on, actively contributing to codebases and applying modern practices from the Thoughtworks Technology Radar.

You will design pragmatic solutions that balance technical constraints, cost efficiency, performance and system safety. Working closely with developers, data scientists, platform engineers and product teams, you will help deliver production-ready AI capabilities that meet business needs and uphold a high bar for quality.

You will also play an active role in fostering a collaborative, inclusive team culture, encouraging feedback and supporting the growth of team members.

Job responsibilities
  • You will design, implement and maintain monitoring and alerting for ML and AI operational signals, including model performance degradation (for all model types, e.g., computer vision, recommendation, GenAI), data drift, latency issues, and anomalies. This includes specific monitoring for GenAI aspects like prompt failures, hallucination trends, guardrail violations, and overall agent workflow health.
  • You will build and operate robust evaluation and testing pipelines for all ML and AI systems, including automated regression tests for models (e.g., accuracy, precision, recall for traditional ML), prompts, workflows, tools and model versions, ensuring new releases meet or exceed established baselines.
  • You will investigate and resolve production issues related to model behaviour, including troubleshooting ML models (e.g., deep learning models for computer vision, collaborative filtering for recommendation), tool-calling errors, vector search/RAG retrieval failures (for GenAI), data quality issues, and integration points across the system.
  • You will collaborate with infrastructure and platform teams to ensure stable, performant and cost-efficient AI inference, including optimisation of deployment strategies, resource usage and runtime configurations.
  • You will manage the lifecycle of ML models, prompts, embeddings, vector indices and associated components, including controlled rollouts, versioning strategies, and automated evaluation gates.
  • You will design and operate effective feedback loops that incorporate real user interactions, evaluation metrics, UAT findings and domain expert reviews, enabling continuous improvement of all ML/AI systems, including agentic systems.
  • You will uphold governance, safety and compliance standards, ensuring observability, auditability, privacy protection and adherence to organisational guidelines for all ML/AI systems and data handling.
  • You will maintain clear, comprehensive documentation covering operational procedures, system behaviours, incident findings, performance benchmarks and deployment practices.
  • You will communicate system health, risks, upcoming changes and operational insights clearly to technical and non-technical audiences.
  • You will support the growth and development of junior team members through guidance, knowledge sharing and constructive feedback.
Job qualifications
Technical Skills
  • High proficiency in Python (Pandas, NumPy, Scikit-learn) for scripting, analysis, and maintaining production models.
  • Strong SQL skills for querying, data manipulation, and operational data checks.
  • Experience building or maintaining GenAI / agentic solutions (e.g., RAG, LlamaIndex, CrewAI, or similar orchestration/RAG tooling).
  • Solid understanding of classical ML algorithms, model evaluation, and challenges like drift and bias.
  • Hands-on experience with model monitoring (data quality, prediction quality, latency) using Prometheus, Grafana, or cloud-native tools.
  • Experience with Azure (Databricks, Azure Machine Learning, etc.) for deployment and resource management; familiarity with GCP/AWS is a plus.
  • Familiarity with Agile methodologies (Scrum/Kanban).
  • Nice to have: Experience with big data frameworks (Spark, Dask) for large-scale processing.
  • Nice to have: Understanding of containerization/orchestration such as Docker and basic Kubernetes.
  • Nice to have: Exposure to workflow/pipeline or IaC tooling (Airflow, Kubeflow, MLflow, Terraform).
Professional Skills
  • Strong influence and advocacy for technical excellence while adapting to change when necessary.
  • Strong analytics and troubleshooting ability.
  • Excellent communication and articulation skills.
  • Ability to navigate ambiguity and tackle challenges from multiple perspectives.
  • Experience mentoring junior consultants.
  • Willingness to be part of a 24x7 on-call rotation, as needed.
Other things to know
Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

#LI-Onsite

See here our AI policy.

Top Skills

Airflow
AWS
Azure
Azure Machine Learning
Dask
Databricks
Docker
GCP
Grafana
Kubeflow
Kubernetes
Mlflow
Numpy
Pandas
Prometheus
Python
Scikit-Learn
Spark
SQL
Terraform

Similar Jobs

25 Minutes Ago
In-Office or Remote
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead the PMO for global regulatory compliance: manage regulatory reporting and licensing programs, coordinate cross-functional stakeholders, define KPIs, support on-site exams, drive risk reporting and special projects to scale governance and process automation.
Top Skills: Ai-Driven Process Optimization,Process Automation,Governance Tools,Vendor Management
26 Minutes Ago
In-Office or Remote
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead regulatory engagement in Singapore and Asia: manage licences, filings, exams and regulator interactions; advise on product and regulatory change; develop compliance policies, monitoring, training and controls; and support licensing, expansion and compliance risk management.
49 Minutes Ago
Remote or Hybrid
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Sell SailPoint's Agentic Technology solutions across Singapore and ASEAN as an overlay Enterprise Account Executive. Develop territory and account plans, engage IT and cybersecurity stakeholders, enable internal teams, drive pipeline and quota attainment, manage the full sales cycle, and collaborate with partners, marketing, and leadership to scale Agentic product adoption.
Top Skills: Sailpoint,Machine Identity Security,Data Access Security,Agent Identity Security,Identity Security,Identity Intelligence,Iaas,Cloud Data Platforms,Salesforce,Clari,Ai

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account