Foundant Technologies Logo

Foundant Technologies

Site Reliability Engineer

Posted 10 Hours Ago
Be an Early Applicant
Remote
Hiring Remotely in Canada
Mid level
Remote
Hiring Remotely in Canada
Mid level
The Site Reliability Engineer will ensure the reliability, availability, and performance of SaaS products, automate processes, manage incidents, and monitor systems. They will also work on capacity planning, security compliance, and documentation to foster teamwork and operational efficiency.
The summary above was generated by AI

About SmartSimple & Foundant 

At SmartSimple and Foundant Technologies, we empower mission-driven organizations to manage their data, workflows, and impact with our comprehensive software solutions. From grant management and community foundations to process automation and data collaboration, our combined expertise supports a diverse range of organizations - from nonprofits and charitable entities to corporations and governments. 


At SmartSimple and Foundant Technologies, we’ve created a powerhouse of solutions designed to meet the unique needs of organizations striving to make a difference. Together, we’re setting new standards in innovation, flexibility, and impact management by helping organizations achieve their missions more efficiently and effectively.  


Where You’ll Work: 

  • As a remote-first workplace, we believe in offering flexibility and the freedom to work where it suits you best, while staying connected through technology. Our global network of talent is supported by physical office hubs and virtual collaboration, fostering a dynamic environment where innovation and growth thrive. 
  • With headquarters in Bozeman, Montana (Foundant), Toronto, Canada (SmartSimple), and our EMEA office in Dublin, Ireland, you’ll be part of a globally connected team. Whether you’re working remotely or from one of our office locations, you’ll be contributing to a vibrant, collaborative culture focused on driving meaningful impact across the world.  


What You’ll Do:

The Site Reliability Engineer (SRE) will play a critical role in maintaining and improving the reliability, scalability, and performance of our SaaS infrastructure and products. You will work closely with software engineers, product teams, and other stakeholders to design, build, and maintain systems that can handle the demands of a growing customer base. You will focus on automating processes, and continuously improving the availability and performance of our services.


Key Responsibilities:

  • Reliability & Availability: Ensure the high availability, reliability, and performance of one or more SaaS products across production and staging environments. Monitor system health, track key performance indicators, and respond to incidents quickly to minimize downtime.
  • Incident Management: Perform incident response, troubleshooting, and post-mortem analysis for production incidents. Work to minimize the impact of incidents and drive improvements based on findings.
  • Automation & Efficiency: Implement automation for routine tasks like deployments, scaling, and maintenance. Develop tools and scripts that improve the operational and cost efficiency of the infrastructure.
  • Change Management: Work closely with engineering, product, and operations teams to design, deploy, and maintain cloud-based infrastructure and applications. Ensure that new releases and updates are deployed smoothly with minimal disruption.
  • Monitoring & Alerting: Build and maintain robust monitoring, alerting, and logging systems to provide real-time visibility into the health of our services. Analyze and act upon monitoring data including availability, performance and error logs to proactively detect and resolve issues.
  • Capacity Planning & Scalability: Monitor system capacity, forecast growth, and ensure that our SaaS platforms scale appropriately to handle increased traffic and load. Design and implement strategies for capacity management.
  • Security & Compliance: Ensure that security best practices are followed for all infrastructure components. Collaborate with security teams to implement security controls, auditing, and compliance measures.
  • Performance Optimization: Continuously optimize the performance of our systems and applications by identifying and addressing bottlenecks and improving overall system throughput.
  • Documentation & Knowledge Sharing: Document systems, processes, and procedures. Foster a culture of knowledge sharing and collaboration across teams to improve operational understanding and best practices.


What You’ll Need:


Requirements:

  • 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a SaaS company or cloud environment.
  • Strong experience with Microsoft Azure, including core services (e.g. Azure App Services, Azure SQL, Azure Functions, Azure Virtual Network, Azure Monitor, Azure DevOps, etc.).
  • Experience with Amazon Web Services (AWS) is a plus (e.g. EC2, ECS, RDS, Lambda, S3, VPC, CloudWatch, CodeBuild, etc.).
  • Strong experience with cloud platforms and infrastructure-as-code tools like Terraform, CloudFormation, or similar.
  • Experience with containerization technologies (Docker, Kubernetes) and orchestration platforms.
  • Experience with application performance monitoring (APM) and log analytics tools (e.g. Azure Monitor, Application Insights, ELK, Datadog, etc.).
  • Proficiency in programming/scripting languages (PowerShell, Python, Bash, etc.).
  • Familiarity with CI/CD pipelines and automation tools.
  • Understanding of web application deployment and hosting fundamentals.
  • Understanding of database management and performance tuning.
  • Knowledge of networking fundamentals and web services (HTTP, DNS, load balancing, web application firewall, etc.).
  • Bachelor's degree in Computer Science, Engineering or a related field, or equivalent experience.
  • Strong analytical and troubleshooting skills with the ability to identify and resolve complex technical issues in distributed systems.
  • Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders.
  • Must be legally eligible to work in Canada.

Nice to have:

  • Azure Solutions Architect Expert or related Azure certification.
  • AWS Certified Solutions Architect or similar professional certification.
  • Experience with managing and maintaining large-scale distributed systems.
  • Experience with security best practices in cloud environments and SaaS platforms.


What You’ll Bring to our Team Dynamics:

  • Adaptive Achievement: You continuously learn from your experiences and adjust strategies to meet the evolving needs of the team and the business.
  • Productive Collaboration: You are comfortable working across functional teams—whether it's with engineers, product managers, or leadership. You communicate complex technical concepts in a clear and actionable manner, ensuring everyone is aligned to achieve shared goals.
  • Service Orientation: You are keen on understanding user needs and translating them into technical solutions that drive organizational success.
  • Active Learning: You are always looking for ways to improve processes, systems, and team workflows, ensuring that the work environment evolves as quickly as the technology we employ.


Why You’ll Love Working at SmartSimple + Foundant:

  • At the heart of everything we do is a commitment to innovation and making a positive impact. Whether you’re working on projects that empower not-for-profits, community foundations, or corporations, your contributions will help drive real-world change. 
  • We offer competitive salary and benefits, including tuition, and lifestyle reimbursements, and bespoke mindfulness and fitness initiatives.  
  • With our Flexible PTO policy, you’ll have the freedom to manage your time in a way that supports your personal well-being and professional success. 
  • We’re committed to your professional and personal development.  With our merger, you'll have the chance to collaborate across teams at both SmartSimple and Foundant, giving you exposure to diverse ideas, expertise, and projects that span multiple industries.  
  • As part of a larger organization, you’ll have even more opportunities to grow your career. Whether it’s exploring new roles, leadership opportunities, or shifting to a different department, we support internal mobility to help you achieve your career goals 
  • You’ll enjoy autonomy and responsibility, empowering you to approach your work creatively and independently, fostering innovation and independent thought. 
  • Employee recognition is a core part of our culture. When you do a great job, we make sure everyone knows about it!  

 

SmartSimple and Foundant are equal opportunity employers, committed to building a diverse workforce that represents the communities we serve. We welcome and encourage applications from all qualified candidates, and will consider all applicants without regard to race, color, citizenship, religion, sex, marital/family status, sexual orientation, gender identity, Indigenous status, age, disability, or individuals who may require accommodation. 


In accordance with the Ontario Human Rights Code, the Accessibility for Ontarians with Disabilities Act (AODA), and other applicable legislation, SmartSimple and Foundant are also committed to providing accommodations throughout the interview and employment process. Accommodations are available upon request for candidates participating in all aspects of the selection process. If you have accessibility requirements during the recruitment process and require accommodation, please contact [email protected]

Top Skills

Amazon Web Services
Application Insights
Azure Monitor
Bash
Ci/Cd
CloudFormation
Datadog
Docker
Elk
Kubernetes
Azure
Powershell
Python
Terraform

Similar Jobs

20 Days Ago
Remote
2 Locations
Mid level
Mid level
Artificial Intelligence • Productivity • Software • Automation
As a Site Reliability Engineer at Zapier, you will enhance the reliability of systems, improve observability, and handle incident response, while collaborating with teams and contributing to automation efforts.
Top Skills: ArgocdAWSDatadogGitlabGoGrafanaKafkaKubernetesOpensearchPrometheusPythonRedisSentryTerraformTypescript
8 Hours Ago
In-Office or Remote
Calgary, AB, CAN
Mid level
Mid level
Cloud • Software
The Site Reliability Engineer will automate software deployment, manage infrastructure, troubleshoot systems, and optimize web applications. Collaboration with developers and mentoring peers is key.
Top Skills: AnsibleAWSAws CloudformationEc2EksIamKubernetesLinuxPackerRdsS3
8 Days Ago
Remote
Canada
Mid level
Mid level
Information Technology
The Site Reliability Engineer at Anaplan will support production platforms, implement changes, improve automation, and provide operational experience while managing incidents.
Top Skills: AzureConfiguration Management ToolsKubernetesLinuxScripting Languages

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account