Remotees is for sale. Submit your bid to hello AT remotees DOT com if you’re interested.

Senior Site Reliability Engineer / Devops Engineer

WizeNoze · Oct 7th 2020

Apply on StackOverflow Careers

Are you looking to work on challenging projects, with a diverse and motivated team, while building technology that can change the world? Do you want to join an award winning startup in their scale-up phase? Do you want to give students all over the world access to a prosperous future? Do you want to work with large distributed systems, machine learning, web crawling, and other interesting technology?

We’re looking for a Senior Site Reliability Engineer / Devops Engineer with 8+ years experience in REST APIs, large-scale distributed systems, and the AWS stack based in Amsterdam or remote within the +-2 GMT timezones. You need to be curious and passionate with a drive to continually improve yourself, your craft, your code, and your colleagues. You need to hold yourself and your peers to the highest standards to deliver the best quality products possible! You must not be complacent, and must keeping working to improve your craft and skills. You must be able to step outside your role to think holistically about our systems, processes, and people; always with a view to improving efficiency and long-term maintainability, reliability, performance, and quality.

Your responsibilities:

  • keeping our distributed systems running smoothly on AWS: focusing on performance, monitoring, alerting, cost optimisation, etc.

  • infrastructure automation via infrastructure as code principles using Terraform, Ansible, etc.

  • configuration management and tracking

  • optimising alerting and reporting to improve signal vs noise

  • code instrumentation to tie all systems together with Spring Micrometer, etc. libs to feed metrics in to Prometheus

  • instrumenting AWS services like RDS, SQS, Elasticsearch Service, Beanstalk, etc. to feed in to Prometheus

  • building useful dashboards in Grafana to help monitor system performance and reliability over time

  • audit and monitor security across AWS

  • optimise network layouts, VPCs, VPNs, and security across AWS

  • process documentation

  • load testing (with infrastructure automation so we can load test routinely by standing up a clone of production)

  • disaster recovery planning and preparation, including routine backup testing

  • self-hosting Elasticsearch clusters with all necessary monitoring, alerting, disaster recovery, and self-healing in place

  • application performance monitoring implementation and assisting developers in profiling and optimising code

  • planning support rotas and standby schedules,- etc.

  • AWS cost optimisation

  • automating onboarding and offboarding of employees across GSuite, AWS, and other SaaS and PaaS platforms in use

Your abilities and skills:

  • Fluent in English

  • 8+ years experience. Mix of development and ops, with at least 5 years dedicated SRE/devops experience

  • SRE/devops experience must be in a cloud environment with large distributed systems

  • At least 2 years AWS experience

  • At least 2 years Java development experience. We need someone that is more devops engineer than sysadmin, with the ability to deeply understand and solve JVM performance and memory issues with the aid of the developers

  • Preferably worked in startups

  • Be able to communicate effectively with team members, but also be able to solve difficult problems on your own with perseverance and tenacity

  • Prove that you’re continually working to improve your skills and knowledge, by raising the standards for your own designs and code constantly, and by reading and learning from books, courses, and peers

  • Friendly and helpful to tech and non-tech team members.

  • Linux administration

  • Shell scripting

  • Python scripting

  • Experience in Java, Elasticsearch, Spring, and any of the other tools under responsibilities is a plus

  • Comfortable working in a remote environment

Your traits:

  • Curious. Able to learn and apply new concepts and tools rapidly

  • Pragmatic. Choose the practical path to delivering a system without getting lost in fancy new technologies or techniques

  • Attention to detail. Understand everything you do and why you’re doing it, no copy/pasting and hoping for the best

  • Perseverance. Able to solve difficult problems without giving up and expecting someone else to take over

  • Take responsibility for your work. Have an owner’s mindset.

  • High degree of personal responsibility over designated duties. Step outside your role if needed to get the job done well

  • Consistent and organised

  • Timely and eloquent communicator

  • Focused on helping the team win, before personal gain

  • Open to receiving objective criticism and improving upon it

  • Like to work in a startup environment

Apply on StackOverflow Careers