Founding Engineer, ML Performance & Systems, San Francisco, CA, United States

Founding Engineer, ML Performance & Systems

5 Days Old

About the Role

We're an early-stage stealth startup building a new kind of platform for generative media. Our mission is to enable the future of real-time generative applications: we're building the foundational tools and infrastructure that make entirely new categories of generative experiences and applications finally possible.

We're a small, focused team of ex-YC and unicorn founders and senior engineers with deep experience across 3D, generative video, developer platforms, and creative tools. We're backed by top-tier investors and top angels, and we're building a new technical foundation purpose-built for the next era of generative media.

We're operating at the edge of what's technically possible: high-performance inference and real-time orchestration of multimodal models. As one of our founding engineers, you'll play a key role in architecting the core platform, shaping system design decisions, and owning critical infrastructure from day one.

If you're excited about architecting and building high-performance infrastructure that empowers the next generation of developers and unlocks entirely new products categories, we'd love to talk.

About the Role

We're looking for a Founding Engineer, ML Performance & Systems with deep expertise in high-performance ML infrastructure. This is a highly technical, high-impact role focused on squeezing every drop of performance from real-time generative media models.

You'll work across the model-serving stack, designing novel architectures, optimizing inference performance, and shaping Reactor's competitive edge in ultra-low-latency, high-throughput environments

What You'll Do Drive our frontier position on real-time model performance for diffusion models Design and implement a high-performance in-house inference engine Focus on maximizing throughput and minimizing latency and resource usage Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities Requirements About You Strong foundation in systems programming, with a track record of identifying and resolving bottlenecks Deep expertise in the ML infrastructure stack: PyTorch, TensorRT, TransformerEngine, Nsight Model compilation, quantization, and advanced serving architectures Working knowledge of GPU hardware (NVIDIA) and the ability to dive deep into the stack as needed (e.g., writing custom GEMM kernels with CUTLASS) Proficient in Triton or willing to learn, with comparable experience in low-level accelerator programming Excited by the frontier of multi-dimensional model parallelism (e.g., combining tensor, context, and sequence parallelism) Familiarity with internals of cutting-edge techniques such as Ring Attention, FA3, and FusedMLP implementations Minimum Qualifications Expertise in systems programming (C++, CUDA) Experience optimizing ML inference on GPUs Proficient with PyTorch and tools like TensorRT Deep understanding of NVIDIA GPU architecture Familiar with model serving, compilation, and quantization Benefits Competitive SF salary and foundational team equity

Apply

Location:: San Francisco, CA, United States
Category:: Computer And Mathematical Occupations