Data Engineer
Neurons Lab
- Praca zdalna
About the project
Join Neurons Lab as a Data Engineer on a new engagement with a regulated UK & Ireland credit and lending company . The client has lifted data from multiple business entities into a newly centralized, anonymized data lake , but lacks the data-engineering depth to make it trustworthy and analytics-ready: current pipelines were assembled quickly (partly AI-assisted), and the descriptive statistics cannot yet be validated or reproduced . You put that foundation on solid ground so the Data Science Lead can model on it with confidence — validate and re-engineer the pipelines, build the harmonization / semantic layer across entities, enforce data quality and lineage, and prepare clean, feature-ready datasets. This is a foundational data-engineering role on a regulated data estate ; data protection and reproducibility are the primary constraints on every decision. Full-time engagement preferable.
What you'll actually do
- **Reproduce a descriptive-statistics report end-to-end** so any figure traces back to raw source — closing the gap the client admitted (numbers they can't currently defend).
- Profile and **reconcile differing source schemas** across acquired entities: map differing field names, types, encodings and business definitions for the same concept into one conformed model.
- Build **dbt staging → intermediate → mart models** with tests; codify the harmonized definitions the Data Science Lead specifies.
- Write **Great Expectations suites** (null / range / uniqueness / referential checks) and wire them into the pipeline so bad data fails loudly rather than silently corrupting analysis.
- Implement **entity / identity resolution** (deterministic + fuzzy matching) where there is no clean shared key for the same customer or account across sources.
- Implement and **verify anonymization / pseudonymization** (hashing / tokenization / k-anonymity) and evidence that re-identification risk is controlled for the client's IT / compliance team.
- **Optimize Spark / Glue jobs over tens of millions of rows** — partitioning, file formats (Parquet), incremental loads, cost control.
- Orchestrate with **Airflow / Step Functions**; build repeatable, scheduled pipelines rather than one-off scripts.
- Prepare **clean, documented, feature-ready datasets** for the PD / delinquency models.
- Document **runbooks** so the offshore team can operate the pipelines and handover takes days, not weeks; help scope onboarding of the remaining (Ireland + additional) sources.
Skills
- Strong **SQL** and **Python** for large-scale data processing
- **AWS data stack**: S3, Glue, Lake Formation, Athena / Redshift, EMR / Spark, Step Functions / Airflow
- **Data modeling & semantic layer** (dbt or equivalent); dimensional modeling
- **Entity resolution / record linkage** across heterogeneous sources
- **Data-quality & testing** frameworks (Great Expectations, dbt tests) and data lineage
- **Anonymization / pseudonymization** techniques and their analytical trade-offs
- Big-data processing (Spark) with performance and cost optimization at scale
- Clear written / verbal English; documents for handover and works well with a distributed team
Knowledge
- **GDPR** fundamentals as applied to anonymized / pseudonymized financial data and UK / EU data residency
- **AWS Well-Architected** (Analytics, Security) for BFSI
- Awareness of credit / risk data structures and what downstream modeling consumers need — a plus
Experience
- **4+ years** in data engineering, with strong **AWS + Spark / SQL at scale**
- Demonstrated experience **harmonizing / integrating data across multiple source systems**
- Experience building **validated, reproducible pipelines in a regulated environment** (BFSI, healthcare, government) — strong plus
- Comfortable stepping into a **messy, partly-built data estate** and bringing it up to standard
- Comfortable as the sole or lead data engineer on a small (3–4 person) delivery pod
- business engineer Hiszpania
- inżynier utrzymania Hiszpania
- inżynier qa Hiszpania
- wyższe inżynier Hiszpania
- customer engineer Hiszpania
- inżynier-ds.-certyfikacji Hiszpania
- inżynier obsługi i konserwacji budynku Hiszpania
- engineering design center Hiszpania
- inżynier systemów IT Hiszpania
- inżynier floty Hiszpania