OKX Logo

OKX

Staff Engineer - Infrastructure/SRE

Reposted Yesterday
Be an Early Applicant
In-Office
Singapore
Senior level
In-Office
Singapore
Senior level
Manage infrastructure stability and optimize performance of cloud services, middleware, big data platforms, and contribute to chaos engineering initiatives.
The summary above was generated by AI
OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa
 
Who We Are
At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.
About the Team
The Service Reliability Engineering team envisions ensuring service stability as one of the company's core competitive advantages. By building end-to-end, chain-level risk management capabilities, we aim to achieve sustainable, automated identification and analysis of stability risks, transitioning from "reactive governance" to "proactive governance". This approach allows us to preemptively address more stability issues, improving user experience.
 
What You’ll Be Doing
  • Effectively optimize existing runtime environments (KVM, Docker, K8S, JVM, etc.) to ensure efficient resource utilization and stable service operation.
  • Deeply understand the architecture and principles of middleware (Kafka, Spring Cloud, Nacos, Apollo, Kong Gateway, etc.), ensuring high performance and availability.
  • Ensure stability and optimize big data platforms (Alibaba Cloud DataWorks, AWS EMR, AWS DataBricks, Spark, Flink) and data warehouses (MaxCompute, Hologres, Hive, Clickhouse, StarRocks, etc.).
  • Comprehend network architecture and security, providing guidance on infrastructure stability based on network architecture and security layers, ensuring secure, stable, and efficient network communications.
  • Lead chaos engineering exercises, coordinating with business units to validate system robustness and recovery capabilities through simulated failure scenarios.
  • Participate in rapid response and troubleshooting of system failures, continuously optimize monitoring strategies to reduce system downtime and ensure service continuity and stability.
  • Drive infrastructure automation and intelligence to improve SRE work efficiency and quality.
  • Collaborate closely with development teams, providing technical support and advice on infrastructure to jointly promote continuous product improvement and innovation.
What We Look For In You 
  • Bachelor's degree or above in Computer Science or related field, with 8+ years of experience in large-scale internet or cloud computing platform development/SRE/operations.
  • In-depth understanding of big data platforms, data warehouses, middleware, runtime environments, and network technology principles and architectures, with rich practical experience and troubleshooting skills.
  • Proficient in Linux system management and optimization, familiar with scripting languages such as Shell/Python, able to write automation tools and scripts.
  • Familiar with container and cloud-native technologies like KVM, Docker, and K8S, including their architectures and principles, with extensive experience in handling common issues and failures.
  • Familiar with network protocols such as TCP/UDP/QUIC, proficient in using network commands like TcpDump, TraceRoute, Netstat, and tools like Wireshark, with rich practical experience in troubleshooting common network issues.
  • Rich experience with Alibaba Cloud and AWS cloud products, from architecture to usage, with extensive practice in dealing with common issues and failures.
  • Practitioners with experience in service governance system construction, architecture optimization, stability assurance construction, capacity management, activity support, and chaos engineering are preferred.
  • Proficiency in both the Mandarin and English language is preferred for communication with local and global stakeholders.
Perks & Benefits
  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants
  • More that we love to tell you along the process!

Top Skills

Alibaba Cloud Dataworks
Apollo
Aws Databricks
Aws Emr
Clickhouse
Docker
Flink
Hive
Hologres
Jvm
Kafka
Kong Gateway
Kubernetes
Kvm
Linux
Maxcompute
Nacos
Netstat
Python
Shell
Spark
Spring Cloud
Starrocks
Tcpdump
Traceroute
Wireshark

Similar Jobs

An Hour Ago
Hybrid
Singapore, SGP
Junior
Junior
Fintech • Mobile • Payments • Software • Financial Services
The KYC Operations Associate Analyst conducts customer due diligence, monitors transactions, ensures regulatory compliance, and maintains accurate customer records while managing key performance indicators.
Top Skills: AmlCustomer Due DiligenceEnhanced Due DiligenceKyc
An Hour Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
As Staff Product Manager, lead product strategy, roadmap, and delivery for payments for platforms, enhancing solutions for marketplaces and compliance.
Top Skills: Embedded FinancePayments Acquiring
An Hour Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
Lead the Ecosystem product at Airwallex, defining strategy, managing integrations, and collaborating with teams to enhance customer and partner experiences.
Top Skills: Ecosystem IntegrationsEngineeringFintechProduct ManagementSoftware Integrations

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account