SecurityScorecard is hiring a DevOps Engineer to bridge the gap between our global development and operational teams who is motivated to help continue automating and scaling our infrastructure. The DevOps Engineer will be responsible for setting up and managing the operation of project development and test environments as well as the software configuration management processes for the entire application development lifecycle. Your role would be to ensure the optimal availability, latency, scalability, and performance of our product platforms. You would also be responsible for automating production operations, promptly notifying backend engineers of platform issues, and checking long term quality metrics.
Our infrastructure is based on AWS with a mix of managed services like RDS, ElastiCache, and SQS, as well as hundreds of EC2 instances managed with Ansible and Terraform. We are actively using three AWS regions, and have equipment in several data centers across the world.
Regions: North America (GMT-7.00) Mountain time - (GMT-4.00) Atlantic time
Training, mentoring, and lending expertise to coworkers with regards to operational and security best practises.
Reviewing and providing feedback on GitHub Pull Requests to team members AND development teams- a significant percentage of our Software Engineers have written Terraform.
Identifying opportunities for technical and process improvement and owning the implementation.
Championing the concepts of immutable containers, Infrastructure as Code, stateless applications, and software observability throughout the organization.
Systems performance tuning with a focus on high availability and scalability.
Building tools to ease the usability and automation of processes
Keeping products up and operating at full capacity
Assisting with migration processes as well as backup and replication mechanisms
Working on a large-scale distributed environment where you were focused on scalability/reliability/performance
Ensuring proper monitoring / alerting are configured
Investigating incidents and performance lapses
Come help us with projects such as…
Extending our compute clusters to support low latency, on-demand job execution
Turning pets into cattle
Cross region replication of systems and corresponding data to support low latency access
Rolling out application performance monitoring to existing services, extending integrations where required
Migration from self hosted ELK to a SaaS stack
Continuous improvement of CI/CD processes making builds & deployments faster, safer, and more consistent
Extending a Global VPN WAN to a datacenter with IPSec+BGP
3+ years of DevOps and/or Operations experience in a Linux based environment
1+ years of production environment experience with Amazon Web Services (AWS)
1+ years using SQL databases (MySQL, Oracle, Postgres)
Strong scripting abilities (bash/python)
Strong Experience with CI/CD processes (Jenkins, Ansible) and automated configuration tools (Puppet/Chef/Ansible)
Experience with container orchestration (AWS ECS, Kubernetes, Marathon/Mesos)
Ability to work as part of a highly collaborative team
Understanding of monitoring tools like DataDog
Strong written and verbal communication skills
Nice to Have
You knew exactly what was meant by "Turning pets into cattle"
Experience working with Kubernetes on bare-metal and/or the AWS Elastic Kubernetes Service.
Experience with RabbitMQ, MongoDB, or Apache Kafka.
Experience with Presto or Apache Spark.
Familiarity with computation orchestration tools such as HTCondor, Apache Airflow, or Argo.
Understanding of network concepts- OSI layers, firewalls, DNS, split horizon DNS, VPN, routing, BGP, etc.
A deep understanding of AWS IAM, and how it interacts with S3 buckets.
Experience with SAFe.
Strong programming skills in 2+ languages.
Tooling We Use
Definitions - Protobuf V3
Normalize from - JSON / XML / CSV
Normalize to - Protobuf / ORC
Interfaces - REST API(s) and object store buckets
Cloud Services - Amazon Web Services
Databases: Postgresql, PrestoDB
Cache: Redis, Varnish
Job Orchestration - HTCondor / Apache Airflow / Rundeck
Analytics - Spark
Storage: NFS/EFS, AWS S3, HDFS
Computation - Docker Containers / VMs / Metal / EMR