Principal Engineer, GPU Platform, San Francisco

Principal Engineer, GPU Platform

New Today

About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. You’ll join the team responsible for running the infrastructure that supports the models backing ChatGPT and the API. The systems we support include inference kubernetes clusters, GPU health, Infiniband performance, node lifecycle, and more.We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. About the Role The inference compute team builds and maintains infrastructure abstractions allowing OpenAI to run models at scale. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new In this role, you will: Design and build the inference infrastructure that power our products, enabling reliability and performance

Ensure our infrastructure can scale to the next order of magnitude

Like all other teams, we are responsible for the reliability of the systems we build. This includes an on-call rotation to respond to critical incidents as needed.

You might thrive in this role if you: Have 10+ years building core infrastructure

Have experience running GPU clusters at scale

Have experience operating orchestration systems such as Kubernetes at scale

Take pride in building and operating scalable, reliable, secure systems

Are comfortable with ambiguity and rapid change

This role is exclusively based in our San Francisco HQ. We offer relocation assistance to new employees. .

Apply

Location:: San Francisco

Start a New Search