Site Reliability Engineer managing high availability SaaS applications

DrFirst, Inc. · Jul 7th 2017

Apply on StackOverflow Careers

Overview The SRE team is in-house expert on building reliable and maintainable systems. They plan infrastructure capacity to accomplish High Availability and uptime goals for all of the DrFirst products. The DevOps/ Site Reliability team eliminates inefficiencies and incompatibilities which jeopardize service availability to deliver a reliable and scalable software service to DrFirst’s clients. Key aspects of this role include automation, configuration management, and tools development while collaborating with the engineering team on projects/products as an expert on reliability, performance, and efficiency.

As a part of the Systems team, you will: • Periodically assess all monitoring requirements and implement necessary enhancements to meet changing/growing business needs • Enhance current automation processes of managing capacity, safely deploying software and mitigating failures • Tune and troubleshoot full-stack software applications using OOPS, Java, web services, Oracle DB, Mongo DB, networks concepts and virtualization techniques • Proactively review, recommend and implement changes to the live infrastructure after ensuring the right validation has been carried out • Assist in rollout and deployment of new product features and installations to facilitate rapid iteration. • Confidently make informed, data-driven decisions in a fast-paced environment with competing priorities • Create and maintain Chef recipes for instance configuration management • Participate in 24/7 on-call rotation and after hours deployment

Apply on StackOverflow Careers