SMC Cloud Logo

SMC Cloud

Data Centre Engineer, Field Operations

Reposted 9 Days Ago
Be an Early Applicant
In-Office
Singapore
Senior level
In-Office
Singapore
Senior level
The Data Centre Engineer will support the HPC infrastructure, perform maintenance and troubleshooting, and communicate with various teams to ensure operational stability.
The summary above was generated by AI

ROLES AND RESPONSIBILITIES

Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global Operations Centre (GOC). This is a unique opportunity to contribute directly to the stability and growth of cutting-edge AI infrastructure.

KEY RESPONSIBILITIES

  • Support in the deployment, configuration, and maintenance of various high-end GPU servers, storage servers, networking equipment and software components in highly secure environments.
  • Perform hardware diagnostics, systems functionality and firmware updates as required.
  • Collaborate with engineering teams to assist in tailored customer environments deployment (eg: bare-metal systems, HPC Clusters, Kubernetes, Slurm etc).
  • Serve as first line of engineering support for onsite operational issues, including troubleshooting hardware, network and software problems.
  • Troubleshoot incidents, escalate critical issues and provide feedback to appropriate teams for improvements.
  • Participate in an on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
  • Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
  • Document incident details, resolutions, and lessons learned to enhance future problem-solving.
  • Maintain clear, accurate, and up-to-date documentation to promote effective knowledge sharing across the team.
  • Communicate effectively with GOC, HPC Engineers, internal teams, stakeholders, and end-users to ensure alignment on issue resolution.
  • Take part in team meetings and knowledge-sharing sessions to foster collaboration and continuous learning.

SKILLS AND EXPERIENCE

  • Bachelor’s degree in computer engineering, computer science, or a related technical field.
  • 5+ years of experience in field service technical areas.
  • Strong understanding of server hardware technology, Linux environments and troubleshooting hardware problems, with adherence to physical and system-level security standards.
  • Experience with scripting languages (eg: Bash, Python)
  • Familiarity with using workload manager and cluster softwares (eg: Slurm, Kubernetes, Nvidia BCM) and Observability tools (eg: Prometheus, Grafana, ELK, etc)
  • Excellent problem-solving and analytical skills.
  • Ability to work independently and as part of a team.
  • Strong communication skills, both written and verbal.

LOCATION
Singapore

EMPLOYMENT BASIS
Full Time
At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.

Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.

Top Skills

Bash
Elk
Grafana
Kubernetes
Linux
Nvidia Bcm
Prometheus
Python
Slurm
HQ

SMC Cloud Singapore Office

Singapore

Similar Jobs

An Hour Ago
In-Office or Remote
5 Locations
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The role involves leading technical discovery, designing POCs, collaborating with AEs and engineers, and addressing customer needs in APAC. Responsibilities include ensuring technical success, creating reusable assets, and staying involved post-sale for customer continuity.
Top Skills: DockerJavaScriptKubernetesPythonRustTypescript
An Hour Ago
Remote or Hybrid
Singapore, SGP
Mid level
Mid level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Platform Strategist will assess platform health, diagnose issues, lead customer sessions, and offer technical guidance while collaborating with various teams.
Top Skills: Servicenow
An Hour Ago
Remote or Hybrid
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Director will drive ServiceNow's revenue growth by leading strategic customer engagements, managing complex deals, and ensuring successful negotiations within the Asia business.

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account