Site Reliability Engineer

MapD · Jun 14th 2018

Apply on StackOverflow Careers

MapD is seeking a Site Reliability Engineer to add to its Cloud Operations and Security team. As a Site Reliability Engineer, you will work closely with other SREs to maintain, optimize, develop, and secure the delivery of the world’s first GPU-accelerated analytics SaaS platform. You should have solid automation, security, and DevOps skills to bring to the table, and experience working in environments with compliance requirements (eg. SOC 2, FEDRAMP, PCI, etc.). This would be a major plus. You must have previous proven experience working on a public cloud platform (e.g. AWS, GCP, Azure, etc.). Key to this role is being self-motivated and a self-starter, as well as a strong passion for system improvement and optimization.

We’re big fans of hiring people who are not just great at what they do but also how they do it. Critical to our culture is building and maintaining a team that works well together and knows how to communicate effectively - not just within their own team, but also across peripheral teams.

We don’t believe in divas or rock stars. We are looking for someone who embodies the best parts of open-source culture: Humility, open-mindedness, positivity, and respect for others. A team member who doesn’t try to single-handedly save the day but embraces input and collaboration as a means to find the best solution.

Your success in this role will be predicated on your ability to prioritize your work, be self-motivated and a self-starter, to speak up early and often, and to work well with others. You should be passionate about building highly available, scalable, and automated “hands-off” systems for customers, and being one of the “go-to” people that team members can trust to get things done and keep things running smoothly with a minimum of fuss. We’re great at encouraging our people to learn different technologies, continue their professional growth, and try out new ways of doing things. We’re in it for the long-haul, and you should be too.

Our office is located in downtown San Francisco, and this position will initially report to the Director of Cloud Operations and Information Security. This is an individual contributor role and will not manage other people. While our preference is to hire local employees, we will also consider exceptional candidates for remote work.

Responsibilities:

  • Build highly available automated scalable microservices with a high quality of service for customers

  • Write and maintain customized toolsets for leading-edge GPU technologies

  • Implement and maintain service monitoring and reporting

  • Incident response and investigation, along with recommending and implementing changes to resolve problems permanently

  • Keeping all cloud services running reliably 24x7

Qualifications:

  • BS or higher degree in Computer Science, or equivalent industry or work experience.

  • 2+ years previous SRE/DevOps experience; previous experience in an enterprise SaaS software environment strongly preferred.

  • Strong desire to continue to learn and improve systems, yourself, and others.

  • Previous experience with a public cloud provider required (AWS, GCP, Azure, etc.)

  • Previous experience working on Linux OS (as well as Bash/Python programming skills) is required.

  • Previous experience with clustered Docker environments.

  • Previous experience with cloud secrets management, service discovery and scheduling.

  • Previous experience with CI ((eg. Jenkins) and CM (eg. Ansible, SaltStack, etc.) automation tools.

  • A passion for automation and security.

  • Strong desire to continue to learn and improve systems, yourself, and others.

Bonuses:

  • Design and implementation of security controls, and understanding good security practices

  • Experience in working in and designing systems for environments with compliance requirements (eg. SOC 2, FEDRAMP)

Apply on StackOverflow Careers