Quix · Feb 2nd 2021
Role As a Site Reliability Engineer you will help deliver and scale a platform that developers love to use. You will: • Maintain existing services to guarantee uptime • Build and implement disaster recovery when it is not and ensuring it is mostly the former via improvements. • Keep services running or getting them back up and running quickly when a failure occurs • Ensure that we ship software that meets security requirements • Automate work including infrastructure needs, failover solutions, failure mitigation • Improve monitoring and alerting solutions • Maintain documentation for recurring issues, prepare incident reports for production issues • Migrate the platform to AWS, GCP and additional cloud platforms as required. • Design and implement on-prem and hybrid-cloud solutions.
Required skills and knowledge • Professional communication skills, both verbal and written • Experience operating large-scale production systems, with keen understanding of design principles and best practices of implementation • Knowledge of: o Networking (DNS, load balancer, etc) o Unix / Linux shell o Encryption for data-in-flight & rest • Experience in the following technologies: o Kafka o Kubernetes o Docker o Azure Cloud Service o Ansible, Terraform or alternatives o Source control (git) o Helm
Nice to have • Experience in the following technologies: o AWS o Google cloud o Chaos Engineering
Benefits • Work from home anywhere in the UK and EU (may be required to travel occasionally) • 2 annual team meet-ups in EU destinations (normally beaches and mountains) • Generous stock options commensurate with the opportunity • 37 days holiday (including all public holidays in your region) • 2 additional paid days off a year for volunteering work • Budget to choose own hardware and office set-up • Training and personal development budget • Regular socials with paid food/drink/games allowance