Senior AI Data Engineer (x/f/m)

About

Depuis 2013, Doctolib poursuit un seul objectif : construire le système de santé dont nous rêvons tous, aux côtés des soignants et des patients. Doctolib soutient 500 000 praticiens avec des technologies innovantes qui leur assurent une vie professionnelle plus sereine. L’entreprise aide également 90 millions de personnes à travers l’Europe à prendre soin de leur santé grâce à un parcours de soins plus simple et plus fluide, de manière sécurisée et confidentielle. Avec plus de 3000 salariés, Doctolib est présent dans plus de 30 villes en France, en Allemagne, en Italie et aux Pays-Bas.

Job Description

What you’ll do

At Doctolib, we're on a mission to transform healthcare through the power of AI. As a Senior Data Engineer , you'll play a key role in building and optimizing the data foundations within the AI Team to deliver safe, scalable, and impactful models.

You will join a dedicated team working on data infrastructure for LLM, VLM and RAG-based systems , powering our new AI Medical Companion .

Your work will ensure that our engineers and data scientists can train, evaluate, and deploy AI models efficiently on high-quality, well-structured, and compliant data.

Your responsibilities include but are not limited to:

  • Ensure high standards of data quality for AI model inputs.
  • Design, build, and maintain scalable data pipelines on Google Cloud Platform (GCP) for AI and machine learning use cases.
  • Implement data ingestion and transformation frameworks that power Retrieval systems and training datasets for LLMs and multimodal models.
  • Architect and manage NoSQL and Vector Databases to store and retrieve embeddings, documents, and model inputs efficiently.
  • Collaborate with ML and platform teams to define data schemas, partitioning strategies, and governance rules that ensure privacy, scalability, and reliability.
  • Integrate unstructured and structured data sources (text, speech,image, documents, metadata) into unified data models ready for AI consumption.
  • Optimize performance and cost of data pipelines using GCP native services (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Vertex AI).
  • Contribute to data quality and lineage frameworks , ensuring AI models are trained on validated, auditable, and compliant datasets.
  • Continuously evaluate and improve our data stack to accelerate AI experimentation and deployment.

Who you are

You could be our next teammate if you have:

  • Master’s or Ph.D. degree in Computer Science, Data Engineering, or a related field.
  • 5+ years of experience in Data Engineering, ideally supporting AI or ML workloads .
  • Strong experience with the GCP data ecosystem
  • Proficiency in Python and SQL , with experience in data pipeline orchestration (e.g., Airflow, Dagster, Cloud Composer).
  • Deep understanding of NoSQL systems (e.g., MongoDB) and vector databases (e.g., FAISS, Vector Search).
  • Experience designing data architectures for RAG, embeddings, or model training pipelines .
  • Knowledge of data governance, security, and compliance for sensitive or regulated data.
  • Familiarity with W & B / MLflow / Braintrust / DVC for experiment tracking and dataset versioning (extract snapshots, change tracking, reproducibility).
  • Familiarity with (Docker, Kubernetes) and CI/CD for data workflows . containerized environments
  • A collaborative mindset and passion for building the data foundations of next-generation AI systems.

What we offer

  • Free health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including remote policy adaptations, extra days off, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of your sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card

The interview process

  1. HR Screen
  2. Technical Deep Dive
  3. System Design
  4. Behavioral Interview
  5. Reference check and criminal records check

Offer!

Additional Information

  • Contract Type: Full-Time
  • Location: Paris
  • Possible partial remote