The Bullish Group has built an ecosystem focused on developing financial services for the digital assets sector through technology and investment businesses. These include: Bullish Exchange - digital asset trading services that utilize central limit order matching and proprietary market making technology to deliver deep liquidity and tight spreads within a compliant framework. The business is licensed by the Hong Kong Securities and Futures Commission, German Federal Financial Supervisory Authority, and the Gibraltar Financial Services Commission. Since its launch in November 2021, Bullish Exchange has surpassed US$1.3 trillion in total trading volume, with 2H 2024 average daily volume exceeding US$2 billion. Bullish Capital - an investment company which offers strategic capital, industry expertise and an extensive network of resources to support initiatives that connect conventional finance with the revolutionary possibilities of the digital economy. CoinDesk - an award-winning media, events, indices and data business servicing the global crypto economy.
Reports to:
Manager, Site Reliability Engineering
We are seeking a skilled and proactive Site Reliability Engineer to join our team. As an SRE, you will be responsible for maintaining and improving the reliability, scalability, and efficiency of our services. You will work closely with operations and development teams to ensure our systems are robust and performant.
Role & Responsibilities:
-
System Reliability: Ensure the reliability and availability of our services by implementing best practices and monitoring solutions. Demonstrate reliability with failure injection and testing failovers.
-
Automation: Automate operational tasks and processes to enhance efficiency and reduce manual intervention. Really embrace the idea of treating “operations as a software problem”.
-
Monitoring & Performance: Develop and maintain monitoring solutions to track system health and performance. Analyze metrics and logs to identify and resolve issues proactively.
-
Incident Management: Participate in on-call rotations and respond to incidents, ensuring timely resolution and conducting post-mortem analysis to prevent recurrence.
-
Collaboration: Work with development and operations teams to integrate reliability into the software lifecycle and provide guidance on best practices.
-
Capacity Planning: Monitor system capacity and performance data to ensure our infrastructure can scale to meet future demands.
-
Continuous Improvement: Identify areas for improvement in our systems and processes and implement solutions to enhance reliability and performance.
Experience & Qualifications:
-
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
-
5+ years of experience in a site reliability engineering or operations role.
-
Proficiency in scripting languages (e.g., Python, Bash) for automation tasks.
-
Experience with cloud platforms (e.g., AWS, Google Cloud, Azure) and containerization technologies (e.g., Docker, Kubernetes).
-
Strong understanding of Linux/Unix systems and networking.
-
Experience with CI/CD pipelines and version control systems (e.g., Git).
-
Good experience with monitoring and logging tools (e.g., Datadog, Prometheus, Grafana, ELK stack, Otel).
-
Strong problem-solving skills and attention to detail.
-
Excellent communication and collaboration skills.
-
Ability to work in a fast-paced, dynamic environment.
-
Debugging / Root Cause Analysis
Preferred Qualifications:
-
Experience with infrastructure as code tools (e.g., Terraform, Ansible).
-
Knowledge of database systems and data management.
-
Experience with microservices architecture and distributed systems.
Bullish is proud to be an equal opportunity employer. We are fast evolving and striving towards being a globally-diverse community. With integrity at our core, our success is driven by a talented team of individuals and the different perspectives they are encouraged to bring to work every day.