Senior Data Engineer

Security Scorecard - We are revolutionizing the cybersecurity industry · Oct 1st 2019

Apply on StackOverflow Careers

This is a remote, home office based role in Argentina.

Opportunity

Are you a intellectually curious, principled and passionate systems software engineer who wants to be a key member of a world class data pipeline team? If elegant design, copy-by-value semantics and generic programming operating at scale appeal to you, then we should talk.

Our engineering team works on a greenfield data processing pipeline that leverages modular, composable, idiomatic C++1x and python to keep things simple and efficient. We haven't ruled out more functional solutions like Haskell yet, we are just missing someone like you to convince us.

SecurityScorecard’s engineering team collects and analyzes data on the evolving security posture/state of the Internet. The problems we tackle are global in scale and must be handled in an efficient and timely manner. Some of the key challenges in our environment are keeping solutions simple and composable so we can reason about them at scale. This is the hallmark of functional programming, high performance computing and large scale distributed systems design.

Roles & Responsibilities

- Analysis, design and troubleshooting of our data distribution and processing architecture

- Key developer of Bluepipe our native processing pipeline

- Primary advocate for tool introduction and removal

- Key party in system selection and configuration

- Clear articulated communication (drawings / presentations / writeups) with engineering, operations and product management staff

- Mentorship of junior and mid-career engineers

Tool We Use

  • Data definition, format and interfaces

  • Definitions - Protobuf V3

  • Normalize from - AVRO / JSON / XML / CSV

  • Normalize to - Protobuf / ORC

  • Interfaces - REST API(s), gRPC and object store buckets

  • Databases - Postgres / Presto

  • Languages - Python / C++14

  • Job Orchestration - HT Condor / Apache Airflow

  • Analytics - Spark / Databricks / Bluepipe (native)

  • Storage - Gluster / NFS / Object Stores

  • Computation - Containers / VMs / Metal

  • Source control: Git

  • Workflow: Github PRs

  • Testing: Jenkins

Informed Point of View

A successful candidate will have an informed point of view on analysis, design and troubleshooting of large scale distributed systems. This starts with developing a deep understanding of the problem. Desired qualities include:

  • Preference for minimally simplistic and highly composable design

  • Strategies to avoid cloud service provider lock-in

  • Reasoned approach to black-box analysis and troubleshooting

Apply on StackOverflow Careers