GitLab · Aug 3rd 2018
Site Reliability Engineers are responsible for the keeping GitLab.com and many other GitLab production systems running smoothly 24/7/365. They're developers specialising in systems, whether it be networking, or the Linux kernel, or even a specific interest in scaling, algorithms, or distributed systems. GitLab.com is a unique site and it brings unique challenges: it’s the biggest GitLab instance in existence; in fact, it’s one of the largest single-tenancy open-source SAAS sites on the internet. The experience of our production engineers feeds back into other engineer groups within the company, as well as to GitLab customers, running on-premise installations. Responsibilities:
Be on a PagerDuty rotation to respond to GitLab.com availability incidents and
provide support for service engineers with customer incidents.
Use your on-call shift to prevent incidents from ever happening.
Manage our infrastructure with Chef, Terraform and Kubernetes.
Make monitoring and alerting alert on symptoms and not on outages.
Document every action so your learnings turn into repeatable actions and then into automation.
Improve the deployment process to make it as boring as possible.
Design, build and maintain core infrastructure pieces that allow GitLab scaling to support hundred of thousands of concurrent users.
Debug production issues across services and levels of the stack.
Plan the growth of GitLab's infrastructure.
Think about systems - edge cases, failure modes, behaviors, specific implementations.
Know your way around Linux and the Unix Shell.
Know what is the use of config management systems like Chef (the one we use)
Have strong programming skills - Ruby and/or Go
Have an urge to collaborate and communicate asynchronously.
Have an urge to document all the things so you don't need to learn the same thing twice.
Have a proactive, go-for-it attitude. When you see something broken, you can't help but fix it.
Have an urge for delivering quickly and iterating fast.
Share our values, and work in accordance with those values.
Projects you might work on:
Coding infrastructure automation with Chef
Improving our Prometheus Monitoring or building new Metrics
Helping release managers deploy and troubleshoot new versions of GitLab-EE.
Migrate GitLab.com from it’s current home on Azure Cloud to Google Cloud Platform.
Migrate GitLab.com to Kubernetes.