AI Data Engineer

Moon Valley logo

Moon Valley

About Moonvalley

Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.

Our team is a convergence of elite talent from DeepMind, Microsoft, Snap, and Meta, with decades of collective experience in machine learning and computational creativity. We have established the first AI-enabled movie studio in Hollywood, collaborating with top producers, actors, filmmakers, and global brands. To date, we've raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures, and YCombinator—and we’re just getting started.

Role Summary

We're looking for a Data Engineer to build the data pipelines driving our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.

You will collaborate with the Data Engineering Lead to develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation.

What You'll Do

  • Build scalable, high-throughput data pipelines optimized for multi-modal video model training.
  • Develop systems for data ingestion, deduplication, quality assessment, validation, filtering, and labeling.
  • Optimize distributed data processing frameworks such as Apache Spark, Ray, and Airflow.
  • Collaborate with infrastructure teams to scale pipelines across thousands of GPUs.
  • Implement strong observability and telemetry for all aspects of the data lifecycle.

What We're Looking For

  • Deep experience building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.
  • Solid background in ML engineering with hands-on experience in training and optimizing classifiers.
  • Experience managing large-scale datasets and pipelines in production.
  • Expertise in Python, Spark, Airflow, or similar data frameworks.
  • Understanding of modern infrastructure including Kubernetes, Terraform, object stores (e.g., S3, GCS), and distributed computing environments.
  • Ability to balance rapid, iterative delivery with a focus on long-term technical vision, creating pragmatic and architecturally elegant solutions.

Nice to Haves

  • Experience working on foundational model training pipelines (image, video, or language).
  • Experience with video-specific data challenges such as frame sampling, codec variability, temporal alignment, and perceptual quality scoring.

Work Environment

At Moonvalley, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We recognize this level of commitment may not suit everyone, and we communicate this expectation openly.

Business roles at Moonvalley are hybrid by default, with some fully remote options depending on job scope. We meet a few times each year, typically in London, LA, or Toronto.

Our Commitment

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations and will work with you to meet your needs.

Any information you share will be treated with the utmost care, used solely for recruitment purposes, and never sold to other companies for marketing. For further details, please review our privacy policy and job applicant privacy policy.

If you're motivated by deeply technical challenges, the ambition to build generational technology, and want to shape the future of media and entertainment, we encourage you to apply.

Location

    UK

Job type

  • Fulltime

Role

Engineering

Keywords

  • Remote
  • Engineering
  • Full-time