Type: Full-time
Team size: 2–3 (expanding gradually)
Core Stack: ClickHouse,dbt, Airflow, Superset, Python
Optional Extras (later phases): Airbyte/PeerDB, Kafka, AI copilots
About the Role
We’re building a governed, self-service data platform that empowers every team to create and document their own analytics while keeping a single source of truth for company-wide metrics.
You’ll be the core engineer responsible for ingestion, transformations, testing, documentation automation, and enabling domain teams to extend the platform safely.
This is not a “tickets in, dashboards out” role — you’ll own architecture, guardrails, and delivery speed. We’re a lean team, so you’ll automate and standardize rather than manually build for every request.
What You’ll Do
- Own data ingestion pipelines from operational DBs and APIs into ClickHouse (raw → staging → core).
- Implement and operate Airflow from day one to orchestrate ingestion, dbt runs, tests, and docs publishing.
- Build and maintain dbt models (core and staging) with tests, docs, and ownership metadata.
- Implement version control & CI/CD for transformations (GitLab CI + dbt).
- Enable self-service modeling: create team schemas, templates, onboarding guides, and governance rules.
- Maintain data quality and observability with dbt tests, freshness checks, and row count monitoring.
- Automate documentation generation and publishing after every change.
- Work directly with product, ops, and analytics teams to align on KPI definitions and SLAs.
(Future) Integrate Airbyte/PeerDB, Airflow/Prefect, or Kafka for scaling needs.
You’re a Great Fit If You…
- Have 3+ years building and running data platforms, not just writing SQL in BI tools.
- Write clean, efficient SQL & Python; treat version control as non-negotiable.
- Understand columnar OLAP engines (ClickHouse, BigQuery, Snowflake, etc.).
- Have run dbt in production with tests and docs.
- Have deployed and managed Airflow DAGs for ingestion and transformation workflows.
- Design for idempotency, partition swaps, and incremental models.
- Care about data governance and documentation as much as query performance.
- Can work independently in a lean, high-autonomy environment.
- Think like a product engineer — balancing speed with maintainability.
Bonus Points If You…
- Have onboarded teams to self-service analytics in dbt.
- Have ClickHouse production experience (merges, partitioning, deduplication).
- Have deployed ingestion pipelines from APIs or Debezium/CDC tools.
- Have built CI/CD pipelines for dbt with automated docs/tests.
- Have introduced AI copilots (ChatGPT, Copilot) to accelerate data workflows.
This Is Not for You If You…
- Only want to write ad-hoc queries and dashboards.
- Avoid infrastructure or schema design.
- Expect a big team to shield you from cross-functional work.
- Don’t want ownership of both technical and process guardrails.
Why This Role Is Different
- You own the platform — from ingestion to governance to enablement.
- Direct stakeholder access — no layers between you and the people using your work.
- High leverage, low bureaucracy — build once, enable many.
- AI tools as force multipliers, not just novelties.