Data Engineer

Job description

As a data engineer on our team, you’ll help build our data acquisition and ingestion system, efficiently aggregating billions of data points across thousands of diverse sources. Our underlying data platform is the foundation for our customer-facing research and risk products, making public records from thousands of different sources easily and readily available.

In this role, you will:

  • Write, maintain, and improve Python code for collecting and processing data from thousands of different third-party sources
  • Analyze new data sources to understand how to best acquire and model/catalogue the data
  • Develop a strong understanding of the ways our users use our products to help inform how we can best present records and data to meet their needs
  • Work with many different kinds of data, both public and proprietary, in many different structured and unstructured formats, ingested from sources such as APIs, databases, websites, files, cloud storage, etc.
  • Curate and monitor existing source integrations to ensure data in our platform is accurate, consistent, available
  • Analyze our data to deliver insights that improve our platform and power new features and products

Job requirements

About you:

  • You LOVE (or are at least intensely interested by) data and the idea of collecting, organizing, and making it more accessible and usable
  • You have worked programmatically with data in a role such as a software engineer, data analyst, digital archivist, scientist, researcher, or data programmer
  • You can quickly profile and understand a data set and implement an appropriate process in code for working with it
  • You’re experienced and proficient with Python, and have a strong working knowledge of web technologies such as HTTP, HTML, and JSON
  • You have experience ingesting data from varied and complex real-world sources including websites, files in multiple formats, databases, and APIs
  • You have a strong understanding of data types, schemas, and normalization and how to work with “dirty” data
  • You are a fast, motivated learner and are willing to pick up new tools and technologies on the fly to solve a problem
  • You’re excited by open-ended problems and are comfortable owning and delivering a solution from start-to-finish
  • You have a VERY strong attention to detail and documentation
  • You enjoy working collaboratively and really care about the work you do, the people you do it with, and the customers who ultimately use the product

Some of the technologies you’ll work with:

  • Python, ElasticSearch, Postgres, Redis, Linux, Celery, Docker, GCP