AI Singapore (AISG) is a national AI programme launched by the National Research Foundation (NRF), Singapore, to build and anchor deep national capabilities in AI. AISG is supported through a government-wide partnership including the NRF, Ministry of Digital Development and Information (MDDI), Infocomm Media Development Authority (IMDA), Economic Development Board (EDB) and Enterprise Singapore (ESG). We bring together research institutions and the vibrant ecosystem of AI start-ups and companies to support impactful research, develop talent, and power Singapore's AI efforts.
We're looking for an AI Engineer, Dev Ops to join the Data Platform team within AI Products at AISG. The role is to keep the Data Platform and the central data management infrastructure that sits beneath it running reliably, securely, and efficiently. You'll own the day-to-day operations of cloud infrastructure across GCP and AWS, the CI/CD and observability stack that the engineering team relies on, and the data pipelines, storage layers, and access controls that move data through the platform. You'll also bring strong AI fluency to the role: we expect the engineer to have knowledge in using AI tools deliberately for ops work such as incident triage, runbook generation, log analysis, IaC drafting, and automation and to help the team raise its operational bar by codifying those practices.
This position will be hosted at the Nanyang Technological University (NTU) under VP (Artificial Intelligence & Digital Economy)’s office and we welcome you to join our community.
Duties & Responsibilities:
Platform operations and reliability
Own day-to-day operations of the Data Platform across GCP and AWS which includes environment health, capacity, performance, cost, security posture and define and uphold SLOs, alerting policies, and on-call practices.
Lead incident response: triage, mitigate, run blameless post-mortems, and drive preventive actions through to completion.
Central data management and data engineering ops
Operate the central data management infrastructure which includes data pipelines, storage layers, catalogues, access control, and lineage tooling and partnering with engineering team on reliability, throughput, cost, and data quality.
Implement and maintain backup, retention, disaster recovery, and data residency controls aligned to programme and funder requirements.
Infrastructure, CI/CD, and automation
Manage infrastructure-as-code (e.g. Terraform) across GCP and AWS, and maintain CI/CD pipelines, container build/registry workflows, and deployment automation so teams can ship safely and frequently.
Strengthen observability across the stack using logs, metrics, traces, dashboards and reduce toil by automating repetitive operational tasks.
AI-assisted ops and continuous improvement
Use AI tools (e.g. Claude, Copilot, Cursor) deliberately in your ops workflow for incident triage, log and metric analysis, runbook drafting, IaC generation and review, and on-call support and codify good patterns for the team.
Build lightweight internal tooling that leverages AI to reduce operational load (e.g. summarising incidents, suggesting remediations, generating change reports), and contribute to security, compliance, and access reviews (secrets management, IAM hygiene, vulnerability patching, audit-readiness).
Requirements:
You should be a hands-on Dev Ops engineer who is comfortable operating cloud infrastructure end-to-end, who understands data engineering well enough to keep central data systems healthy, and who actively uses AI tools to make ops work faster and more reliable.
The ideal candidate would have:
A degree in Computer Science, Information Technology, or equivalent.
At least 3–5 years of Dev Ops, SRE, or platform engineering experience, with a track record of operating production systems at non-trivial scale.
Hands-on experience operating workloads on both GCP and AWS including IaC (e.g. Terraform), containers and orchestration (e.g. Docker, Kubernetes), and managed services for compute, storage, and networking.
Strong cybersecurity knowledge and hands-on practice including IAM and least-privilege design, secrets management, network security, vulnerability and patch management, audit logging, and security incident response. Familiarity with relevant standards and frameworks (e.g. ISO 27001, NIST, OWASP) and with data protection requirements (e.g. Singapore PDPA, GDPR where applicable) is expected.
Working knowledge of data engineering and central data management including pipelines, storage formats, catalogues, access controls, and data quality/observability concepts.
Strong fundamentals in CI/CD, observability (logs/metrics/traces), and incident response.
Demonstrated use of AI tools (e.g. Claude, Copilot, Cursor) in your day-to-day engineering for code generation, review, debugging, and documentation with a clear sense of where they help and where they don't.
Solid scripting/programming skills (e.g. Python, Bash, Go) and comfort reading other people's code across the stack.
Strong communication skills and the ability to work with both technical and non-technical stakeholders across cultures and time zones.
Working knowledge of AI/ML evaluation, benchmarking, and/or data annotation workflows, and the tooling ecosystem around them (e.g. evaluation harnesses, annotation platforms).
Bonus: experience with multimodal data systems (text, audio, image, video); contributions to open-source AI/data tooling; experience operating within Singapore public sector or research programmes.
We regret to inform that only shortlisted candidates will be notified.
Hiring Institution: NTU

