The Site Reliability Engineer at Neo Group will enhance monitoring, optimize alerting systems, automate processes, and support cloud and on-premise environments while mentoring staff and driving proactive improvements.
Come on board with Neo Group! Here's your chance to stir things up in the scene with us. We're not just expanding; we're revolutionising the entire game, mastering profitability with every new venture. But you know what truly fuels our drive? It's people like you. Join us as we embark on a journey to redefine gaming on a global scale.
Neo Group is on the lookout for a Site Reliability Engineer to join our PMO Department.
Key Responsibilities:
- Maintain and enhance monitoring and logging infrastructure.
- Improve observability processes and implement predictive failure analysis.
- Optimize alerting systems: reduce noise, fine-tune critical metrics.
- Define key monitoring parameters and enhance visibility.
- Support and improve both cloud-based and on-premise environments.
- Automate processes and configuration management using Infrastructure as Code (IaC) principles.
- Train and mentor 24/7 App Support staff.
- Develop Runbooks, documentation, and troubleshooting guides.
- Analyze incidents, identify patterns, and drive proactive monitoring improvements.
- Establish and support the Monitoring & Diagnostics group within App Support.
- Develop intelligent troubleshooting instructions for faster incident resolution.
- Optimize existing monitoring by reducing unnecessary alerts and adding meaningful metrics.
- Enhance reliability through structured incident management and post-mortem analysis.
- Implement GitOps best practices for managing infrastructure and configuration.
- Advanced Linux user with strong command-line and diagnostic skills.
- 4+ years of experience as an SRE/Monitoring Engineer.
- Strong understanding of monitoring, logging, and observability in production environments.
- Experience optimizing alerting systems and implementing predictive analytics.
- Hands-on experience managing both cloud and on-premise solutions.
- Automation skills using Python or Go.
- Proficiency with configuration management tools (Ansible, Terraform).
- Solid grasp of networking principles and protocols.
- Understanding of information security principles.
- Experience with CI/CD pipelines (GitLab, Jenkins).
- Familiarity with orchestrators (Kubernetes, Rancher).
- Experience documenting workflows and training support teams.
- Ability to create intelligent troubleshooting instructions.
- Skills in incident analysis and pattern recognition.
Nice to Have:
- Experience working with high-load systems.
- Deep understanding of APM tools (New Relic, Datadog, etc.).
- Database and message queue performance tuning.
- Advanced knowledge of ML-driven monitoring and predictive analysis.
- Experience with automated incident response (self-healing systems).
Soft Skills:
- Responsibility, initiative, and strong analytical thinking.
- Ability to collaborate effectively within a team.
- Focus on automation and process improvement.
- Strong documentation and knowledge-sharing skills.
- Capability to diagnose complex incidents and provide actionable insights.
- Enjoy 5 paid health days per year for those unforeseen sick days or medical appointments.
- Recharge your batteries with 25 paid calendar vacation days annually to explore, relax, and rejuvenate.
- Rest easy with comprehensive medical insurance coverage for employees.
- Stay active and healthy with a monthly sports allowance of $30 net to support your fitness goals.
- Enhance your language skills with English lessons facilitated by our two experienced tutors.
- Stay ahead in your field with access to conferences and professional literature to fuel your growth.
- Boost your energy and morale with complimentary snacks available in the office.
- Foster camaraderie and celebrate achievements through engaging in corporate events throughout the year.
Top Skills
Ansible
Datadog
Gitlab
Go
Jenkins
Kubernetes
Linux
New Relic
Python
Rancher
Terraform
Similar Jobs
Other
As a Senior C#/.NET Software Engineer, you will develop and design features for the BOSS Revolution money transfer back-end and web portal. You will work on anti-fraud and compliance system projects, leveraging your skills in microservices, cloud platforms, databases, and potentially machine learning technologies.
Other
Design, develop, and maintain backend systems for a B2B messaging platform, collaborating with teams to ensure high-quality and secure software solutions.
Top Skills:
AWSAws CdkAws SqsDockerGitGoJenkinsKafkaNoSQLPythonRabbitMQSQLTerraform
Marketing Tech • Software
As a Senior QA Engineer, you'll develop and maintain test plans, execute manual tests, document testing processes, and collaborate with R&D teams to ensure product quality.
Top Skills:
CSSGradleHTMLJavaJavaScriptJunitMavenSelenideSeleniumSQLTestng
What you need to know about the Singapore Tech Scene
The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.