Portfolio Jobs

I only invest in exceptional people. Now you can work with them!
Sarah Smith Fund
companies
Jobs

Member of Technical Staff - AI Infrastructure Engineer

GenPeach AI

GenPeach AI

Software Engineering, Other Engineering, IT, Data Science
Posted on Mar 3, 2026

About GenPeach AI

GenPeach AI is a product-driven research lab aiming to redefine how people create and interact through multimodal, emotionally resonant AI.

We are building vertical foundation models specializing in generating hyper-realistic humans in image & video. Our stack involves working with PB-scale proprietary datasets, designing novel model architectures, efficiently training them on large GPU clusters and integrating them in the end-user products.

We train and deploy our own large-scale models and ship them into real products. Our team operates at the intersection of research-grade AI and production-grade systems engineering.

About the Role

We are looking for a Member of Technical Staff (MTS) to own and evolve the Large Scale Data Infrastructure layer that powers both research and production at GenPeach AI.

This is a data infrastructure-first role: your primary mission is to design and operate the systems that store, process, and serve petabytes of image and video data — and to run the large-scale GPU inference pipelines that annotate, transform, and enrich that data at scale. The datasets you build and maintain will directly fuel the training of our foundation models.

This is a high-ownership, high-impact role at the core of what makes GenPeach's research and products possible.

In this role, you will

  • Build and maintain systems that ingest raw image, video, and multimodal data from many sources and turn it into high-quality, training-ready datasets at petabyte scale.

  • Design and operate large-scale data pipelines for processing, annotation and dataset generation.

  • Build and maintain infrastructure for large-scale dataset analytics to support dataset inspection, curation, and quality analysis.

  • Optimize distributed inference workloads used for data annotation across many GPUs, with strong focus on throughput, efficiency, and reliability.

  • Build efficient storage, retrieval, and migration systems for petabyte-scale datasets across object storage, clouds, and infrastructure providers.

  • Ensure training datasets can be consumed efficiently in any cloud or provider environment, with data loading and storage throughput not becoming bottlenecks during distributed training.

  • Improve training infrastructure reliability, monitoring, debuggability, and utilization on large GPU clusters.

  • Build and maintain infrastructure-as-code, observability, and operational tooling across data and training workloads.

  • Help build scalable, provider-agnostic infrastructure for production GPU inference services powering our creative platform.

  • Work closely with ML, research, backend, and data teams, and contribute to long-term infrastructure decisions.

Minimum Qualifications

  • 5+ years of professional software engineering experience (Python)

  • Hands-on experience managing large-scale datasets (hundreds of TBs or more) on object storage (S3 or equivalent)

  • Experience building and operating large-scale GPU inference workloads: job queues (e.g. SQS, Kafka, RabbitMQ) dispatching AI model inference jobs across large fleets of GPUs

  • Kubernetes at scale: orchestrating large numbers of pods for data-processing or inference workloads

  • Strong Python proficiency, including async programming, concurrency/multiprocessing, and performance optimization

  • Experience operating production systems in Linux environments

  • Hands-on experience with Docker and Infrastructure-as-Code (Terraform or similar)

Preferred Qualifications

  • Experience specifically with large-scale image or video datasets (PB-scale)

  • Experience with workflow orchestrators: Ray, Dagster, Airflow, Slurm, or similar

  • Familiarity with model serving architectures and inference optimization

  • Exposure to observability stacks (metrics, logs, tracing) for ML systems

  • Experience with distributed model training (multi-GPU / multi-node) — purely a bonus, not a requirement

  • Strong fundamentals in data structures and algorithms


What Makes This Role Unique

  • You own the data layer that directly determines what our foundation models can learn — model quality starts with you

  • You'll run GPU inference at a scale most engineers never touch: hundreds of GPUs processing petabytes of image and video

  • Research and infra work side by side — your decisions quickly show up in model results

  • Small senior team, high trust, no bureaucracy

Our Culture

  • High ownership and accountability

  • Strong technical standards

  • Direct, low-ego communication

  • Bias toward shipping, measuring, and iterating fast

Logistics

  • Location: Zurich or Warsaw: onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered).

  • Competitive salary + meaningful equity (depending on role and level)

  • Interview process: quick screen → technical (practical + systems) → team fit/values