Founding Engineer, ML Performance & Systems
5 Days Old
About the Role
We're an early-stage stealth startup building a new kind of platform for generative media. Our mission is to enable the future of real-time generative applications: we're building the foundational tools and infrastructure that make entirely new categories of generative experiences and applications finally possible.
We're a small, focused team of ex-YC and unicorn founders and senior engineers with deep experience across 3D, generative video, developer platforms, and creative tools. We're backed by top-tier investors and top angels, and we're building a new technical foundation purpose-built for the next era of generative media.
We're operating at the edge of what's technically possible: high-performance inference and real-time orchestration of multimodal models. As one of our founding engineers, you'll play a key role in architecting the core platform, shaping system design decisions, and owning critical infrastructure from day one.
If you're excited about architecting and building high-performance infrastructure that empowers the next generation of developers and unlocks entirely new products categories, we'd love to talk.
About the Role
We're looking for a Founding Engineer, ML Performance & Systems with deep expertise in high-performance ML infrastructure. This is a highly technical, high-impact role focused on squeezing every drop of performance from real-time generative media models.
You'll work across the model-serving stack, designing novel architectures, optimizing inference performance, and shaping Reactor's competitive edge in ultra-low-latency, high-throughput environments
What You'll Do
Drive our frontier position on real-time model performance for diffusion models
Design and implement a high-performance in-house inference engine
Focus on maximizing throughput and minimizing latency and resource usage
Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities
Requirements
About You Strong foundation in systems programming, with a track record of identifying and resolving bottlenecks
Deep expertise in the ML infrastructure stack:
PyTorch, TensorRT, TransformerEngine, Nsight
Model compilation, quantization, and advanced serving architectures
Working knowledge of GPU hardware (NVIDIA) and the ability to dive deep into the stack as needed (e.g., writing custom GEMM kernels with CUTLASS)
Proficient in Triton or willing to learn, with comparable experience in low-level accelerator programming
Excited by the frontier of multi-dimensional model parallelism (e.g., combining tensor, context, and sequence parallelism)
Familiarity with internals of cutting-edge techniques such as Ring Attention, FA3, and FusedMLP implementations
Minimum Qualifications
Expertise in systems programming (C++, CUDA)
Experience optimizing ML inference on GPUs
Proficient with PyTorch and tools like TensorRT
Deep understanding of NVIDIA GPU architecture
Familiar with model serving, compilation, and quantization
Benefits Competitive SF salary and foundational team equity
- Location:
- San Francisco, CA, United States
- Category:
- Computer And Mathematical Occupations