Design, deploy, and operate multi-tenant Kubernetes platforms and supporting infrastructure. Automate cluster lifecycle, implement observability, collaborate on security and compliance, onboard application teams, respond to incidents, and drive reliability engineering practices.
As a Containers Platform Engineer at CSIT, you will design, deploy, and support container orchestration platform services that power mission-critical workloads across on-premise networks. You will join the Runtime Platforms team - responsible for delivering secure, reliable, and scalable container services that enable other CSIT engineers to deploy application workloads efficiently while maintaining strong security and compliance standards. Our team operates a multi-tenanted Kubernetes provider platform and it has included many enhancements aimed to improve developers' productivity. These include 3rd-party technologies as well as bespoke workflow enhancements.
Responsibilities
- Design and enhance Kubernetes provider platforms and supporting infrastructure to improve scalability, reliability, and developer experience.
- Automate and simplify Kubernetes clusters lifecycle management, upgrades, and observability workflows.
- Implement monitoring and alerting systems using tools such as Prometheus, Grafana, or Elastic Observability to meet service-level objective (SLOs).
- Collaborate with security teams to integrate and enforce security controls and compliance requirements within the container platform.
- Work with application teams to improve platform usability, streamline onboarding, and reduce operational toil.
- Respond to incidents and perform post-incident reviews, driving continuous improvement and operational excellence.
- Contribute to the reliability engineering culture, fostering shared responsibility for system availability and performance.
Requirements (Minimum Qualifications)
- Background in Computer Engineering, Computer Science or related field
- Strong programming or scripting experience (e.g. Git, Terraform, Javascript, Python, or Bash)
- Good understanding of Linux systems, containers, and networking fundamentals.
- At least 1-3 years of hands-on experience operating or managing Kubernetes clusters in production environments.
- Familiarity with CI/CD pipelines, infrastructure-as-code, and configuration management (e.g. Terraform, Ansible, Helm).
- Experience implementing observability and monitoring in large-scale systems.
Good-to-haves
- Knowledge of Kubernetes security concepts such as RBAC, admission controllers, and policy enforcement.
- Experience with GitOps workflow and deployment tools (e.g. ArgoCD, Gitlab Runner)
- Understanding of service mesh technologies (e.g. Istio)
- Exposure to reliability engineering practices, including SLOs, error budgets, and capacity planning
- Familiarity with cloud platforms (AWS, GCP or Azure) and hybrid infrastructure architectures
- Knowledge of networking protocols (HTTP, TCP, DNS) and troubleshooting tools
- Passion for open-source technologies
Why join us?
- At CSIT, you will:
- Build and operate infrastructure that supports Singapore's national security missions.
- Work with talented engineers who take pride in operational excellence, collaboration, and innovation.
- Be empowered to experiment, improve, and scale modern technologies securely.
- Have opportunities to deepen your expertise in Kubernetes, SRE practices, and secure platform engineering at scale.
As CSIT is an agency under the Ministry of Defence (Singapore), only Singapore Citizens will be considered.
Centre for Strategic Infocomm Technologies Singapore Office
Similar Jobs
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Lead a team to design and implement CMOS device solutions for NAND products, improving yield and reliability while managing cross-department collaborations and fostering team development.
Top Skills:
AICmosNand
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Lead package reliability, qualification, and failure analysis across NPI and HVM. Drive test methodology, risk assessment, DFR/DFM integration, cross-functional collaboration, team development, and use of data/AI to improve reliability and enable product ramps across mobile, automotive, data center, and AI/HPC segments.
Top Skills:
8D Root Cause AnalysisAec-QAi-Enabled ToolsBoard-Level ReliabilityCross-SectioningDesign For Manufacturability (Dfm)Design For Reliability (Dfr)Digital Quality SystemsDramEdxFailure AnalysisHastHbmHybrid BondingJedecMslNandOsatsPreconditioningPredictive Reliability AnalyticsSamSemSystem-In-PackageTemperature CyclingX-Ray
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Design and implement analytics, optimization, and web solutions to improve semiconductor manufacturing efficiency. Develop models for scheduling, capacity, and cycle time, collaborate with stakeholders, manage project requirements and deliverables, and communicate findings to varied audiences.
Top Skills:
AWSAzureBusiness IntelligenceC#Data AnalyticsGCPMachine LearningPythonSQLWeb Application
What you need to know about the Singapore Tech Scene
The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

.jpeg)