Snowplow Analytics · Jun 1st 2020
Site Reliability Engineer (AWS) Remote, located in the UTC +/- 2 region
At Snowplow, we are on a mission to empower people to differentiate with data. We provide the technology to enable our customers to take control of their data and empower them to do amazing things with it.
There are tens of thousands of pipelines using our open source pipeline worldwide, collecting data emitted from over half a million sites. Running on AWS and GCP data technologies, it is ideal for data teams who want to manage their data in real-time and in their own cloud. We also collect, validate, enrich and load in the region of 5 billion events for our customers each day and help them on their Snowplow journey through our management console.
To support our ongoing growth, we are now looking for an experienced Site Reliability Engineer (SRE) to join our Tech Ops Team. You’ll be taking the lead on all things AWS including development and improvements of the current stack and rolling out new features - all whilst keeping these environments running smoothly. We would love to hear from you if the idea of programmatically controlling thousands of remote production environments excites you!.
Our Private SaaS offering has grown significantly over the past year and we now orchestrate and monitor Snowplow event pipelines across hundreds of customer-owned AWS & GCP sub-accounts. Each account has its own individualised and optimised stack and all are capable of processing many billions of events per month.
We are looking for another SRE to help us grow to managing 1,000 and then 10,000 AWS, GCP and (in the future) Azure accounts. You will be pioneering solutions to managing estates of this size through cutting edge monitoring and automation. You’ll work closely with our Tech Ops Lead on all aspects of our proprietary deployment, orchestration and monitoring stacks.
Tech Ops has two areas of responsibility: the centralised services we provide customers and their pipeline infrastructure hosted in their own AWS or GCP accounts. Within both domains we are striving to increase service reliability, fulfil customer requests in a timely fashion, and automate recurring tasks. Task automation is essential as our customer base grows, because our infrastructure estate scales linearly with our customer numbers, unlike most software businesses.
The challenge of automating the maintenance and deployment of thousands of individualised stacks is an enormously ambitious undertaking and a hugely exciting infrastructure automation challenge you’re unlikely to find anywhere else!
The environment you’ll be working in:
Our company values are Transparency, Honesty, Ownership, Inclusivity, Empowerment, Customer-centricity, Growth and Technical Excellence. These aren’t just words we plucked out of thin air, we came up with them together as a company and are continually looking to find new ways to weave these into our day to day operations. From flexible hours and working locations to the way we give feedback, we’re passionate about building a company that supports both company and individual development.
What you’ll be doing:
What you bring to the team:
What you’ll get in return