Operations Engineer, DevOps - Remote

Peak Games · Dec 5th 2017

Apply on GitHub Jobs

Peak Games is at a turning point. It’s a fascinating time to be joining our team as we have become a global consumer product company with multiple products. We are on a journey of reaching hundreds of millions of people and making our products part of their daily lives. We believe the only way to achieve this is to maintain our culture of continuously learning, evolving and striving for the best as a team.

Every day, millions of our users play our games, generating billions of requests and a massive amount of data. Our Cloud Platform Team is responsible for automating the entire stack of high performance, large scale, geographically dispersed resources throughout its life-cycle to accommodate the high growing user base. Join our Cloud Platform Team who is the spine of our growth and help them make sure that our services are up all the time.

Peak Games' Cloud Platform Team is responsible for building, orchestrating, scaling and maintaining highly available systems. As an Operations Engineer, you will provide support for the ongoing operations to improve the overall scalability and maintainability of the infrastructure. You’ll have limitless exposure to all technologies used in the infrastructure of some of the largest casual games in the world as well as of Big Data, including numerous bleeding edge technologies. Operations Engineers at Peak Games own and solve every DevOps & Systems problem in detail - even if it seems small, we believe that they are all part of a big problem. You’ll be helping to automate everything you can think of while handling a broad scope of operational responsibilities, from updating backup processes to completing tiny security or network-related tasks on the cloud. You will work side by side with our DevOps Engineers to ensure we provide a scalable and robust cloud delivery infrastructure.

Major Responsibilities/Activities

  • Master the Peak Games platform so that you can provide full stack diagnostics, when necessary, to help determine the root cause of internal problems and service issues.
  • Keep up with metrics and use monitoring systems to provide the best performance, scalability and stability.
  • Shape and perform all systems management-related tasks like server bootstrapping, site build-outs, software rollouts and security patches including monitoring and alerting.
  • Document, test and improve procedures for systems and services while troubleshooting possible issues.
  • Together with your team you’ll be participating in on-call rotation. While we help each other out, you may find yourself dealing with an arising issue at any given time.

What You Need for this Position

  • BA/BS in Computer Engineering or related discipline, or equivalent work experience
  • 5+ years of operational experience of highly-scalable, distributed systems on Linux (CentOS)
  • Experience working with large scale deployments
  • Ideal candidates will have experience of AWS and/or other cloud tech
  • Experience in DevOps technologies, cloud-based provisioning, monitoring, and troubleshooting
  • Solid understanding of application data flow and how it meets system infrastructure
  • Scripting and/or development skills to automate everything

Main technologies used by our Cloud Platform Team:

AWS, CentOS, Saltstack, Terraform, Ansible, Prometheus, Alertmanager, Grafana, Jenkins, Consul

Apply on GitHub Jobs