Centre for Strategic Infocomm Technologies Logo

Centre for Strategic Infocomm Technologies

System Reliability Engineer (Data Centre)

Posted 3 Hours Ago
Be an Early Applicant
In-Office
Singapore, SGP
Junior
In-Office
Singapore, SGP
Junior
Ensure reliability, availability, and performance of data centre IT operations. Manage day-to-day monitoring, incident lifecycle, capacity planning, documentation, observability and network monitoring, remote management tools, and define SRE metrics (SLO/SLI/error budgets). Collaborate with facilities on power, cooling, and physical infrastructure.
The summary above was generated by AI
You will be part of a dynamic team responsible for ensuring the reliability, availability, and performance of our data centre's IT operations. As a System Reliability Engineer (Data Centre), you will oversee the day-to-day IT operations within the data centre, working closely with various teams to ensure seamless IT service delivery. While knowledge of data centre power and cooling infrastructure is beneficial, the primary focus of this role is on IT operations. You will collaborate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure as needed. You must have a good understanding of cloud infrastructure technologies, architecture, and site reliability engineering (SRE) principles. 

Responsibilities

  • Oversee and manage IT operations within the data centre, including day-to-day monitoring, incident management, and problem management
  • Lead the end-to-end incident management lifecycle that encompass immediate troubleshooting, root cause identification, and resolution implementation to restore services, followed by comprehensive post-incident analysis
  • Develop and maintain documentation on IT infrastructure, operations, and procedures within the data centre
  • Perform capacity planning to ensure IT infrastructure is scalable for future demands
  • Collaborate and coordinate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure
  • Design and implement robust observability platform alongside network monitoring tools for performance monitoring and real-time alerting of IT devices and networks
  • Implement and manage remote management tools for out-of-band access and control of IT devices and servers
  • Define, implement, and track SRE metrics, including SLO, SLI, and error budgets to improve data centre IT reliability

Requirements (Minimum Qualifications)

  • Background in Computer Science, Computer or Electrical Engineering, Information Technology or a related field
  • Good technical knowledge in IT infrastructure, including servers, storage, networking, and cloud technologies
  • Proficient in IT management software and tools
  • 2 years of working experience in IT operations is preferred
  • Fresh graduates are welcomed to apply
  •  

As CSIT is an agency under the Ministry of Defence (Singapore), only Singapore Citizens will be considered.

Centre for Strategic Infocomm Technologies Singapore Office

Similar Jobs

5 Hours Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Lead a team to design and implement CMOS device solutions for NAND products, improving yield and reliability while managing cross-department collaborations and fostering team development.
Top Skills: AICmosNand
5 Hours Ago
In-Office
Singapore, SGP
Expert/Leader
Expert/Leader
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Lead package reliability, qualification, and failure analysis across NPI and HVM. Drive test methodology, risk assessment, DFR/DFM integration, cross-functional collaboration, team development, and use of data/AI to improve reliability and enable product ramps across mobile, automotive, data center, and AI/HPC segments.
Top Skills: 8D Root Cause AnalysisAec-QAi-Enabled ToolsBoard-Level ReliabilityCross-SectioningDesign For Manufacturability (Dfm)Design For Reliability (Dfr)Digital Quality SystemsDramEdxFailure AnalysisHastHbmHybrid BondingJedecMslNandOsatsPreconditioningPredictive Reliability AnalyticsSamSemSystem-In-PackageTemperature CyclingX-Ray
5 Hours Ago
In-Office
Singapore, SGP
Mid level
Mid level
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Design and implement analytics, optimization, and web solutions to improve semiconductor manufacturing efficiency. Develop models for scheduling, capacity, and cycle time, collaborate with stakeholders, manage project requirements and deliverables, and communicate findings to varied audiences.
Top Skills: AWSAzureBusiness IntelligenceC#Data AnalyticsGCPMachine LearningPythonSQLWeb Application

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account