Site Reliability Engineer

Numbrs · Mar 12th 2018

Apply on StackOverflow Careers

Responsibilities include but are not limited to deploying, supporting, monitoring and troubleshooting large scale micro-service based distributed systems with high transaction volume; documenting the IT infrastructure, policies, and procedures. Applicants are also expected to participate in after-hours work and an on-call rotation.

All candidates will have

  • a Bachelor's or higher degree in technical field of study

  • a minimum of two years' experience deploying, monitoring and troubleshooting large scale distributed systems

  • a good understanding of network and routing protocols (TCP/IP, DNS and others)

  • excellent knowledge of at least one modern programming language, such as Go, Java, C++, Python and Scala

  • experience with systems for automating deployment, scaling, and management of containerised applications, such as Kubernetes and Mesos

  • excellent troubleshooting and creative problem-solving abilities

  • excellent written and oral communication and interpersonal skills

Ideally, candidates will also have

  • experience deploying and supporting big data technologies, such as Kafka, Spark, Storm, Flink and Cassandra

  • experience implementing, operating, and supporting open source tools for network and security monitoring and management on Linux/Unix platforms

  • experience with encryption and cryptography standards

Location: Zurich, Switzerland

Apply on StackOverflow Careers