Overview

We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.

Key responsibilities

  • check-circle
    Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
  • check-circle
    Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
  • check-circle
    Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
  • check-circle
    Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
  • check-circle
    Implement robust data validation, lineage, and governance systems using Unity Catalog.
  • check-circle
    Optimize performance across distributed compute environments (Databricks, EC2).
  • check-circle
    Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.

Required experience

  • tick
    5+ years working with big data systems in production environments.
  • tick
    Proven expertise with Databricks, Unity Catalog, and Apache Spark.
  • tick
    Proficiency in Airflow, AWS stack (Lambda, EC2, RDS), and cloud-based data lake architectures.
  • tick
    Strong SQL and database design skills (PostgreSQL preferred).
  • tick
    Working knowledge of vector databases (Chroma, Pinecone, FAISS).
  • tick
    Solid understanding of data lifecycle management in ML/AI contexts.
  • tick
    Bonus: Familiarity with LangGraph, LangSmith, LangChain, or similar agent orchestration tools.

Bonus points

  • tick
    Experience with AI agent pipelines or large-scale ML model support.
  • tick
    Emphasis on data observability, security, and lineage tracking.
  • tick
    Hands-on with RAG architecture, including vector storage and semantic retrieval.
  • tick
    Exposure to AWS Bedrock and model deployment orchestration.

To apply

Send your CV, a snappy cover letter which highlights your expertise, skills and experience and any relevant links/attachments to your work.

Apply here

Have questions?Write to us

Careers

Open vacancies

View all vacancies
  • CMREC-1436 Senior Back End Engineer

    We’re building a platform of autonomous AI agents for data collection, labeling, and intelligent analysis and we’re looking for a Senior Backend Engineer to power this system. You ...

  • Junior Project Manager

    We are seeking a motivated Junior IT Project Manager with a solid foundation in software development and knowledge of Scrum methodologies to support the delivery of IT projects. Th...

  • Project Manager - Japanese Language

    We are looking for a Project Manager with proficiency in the Japanese language to lead and coordinate projects effectively. The ideal candidate will have 3 to 10 years of experienc...

  • Full Stack Developer - Japanese Language

    We are seeking a talented Full Stack Developer with proficiency in the Japanese language to join our team. The ideal candidate will have 1 to 7 years of experience in full-stack de...