Staff Platform Engineer – Agentic AI Systems, IFS The Loops

New Yesterday

We’re seeking a  Staff Platform Engineer  to help shape the future of  agentic AI systems . In this role you will help design the backbone of our real-time, distributed systems. You’ll be at the forefront of building systems that orchestrate massive data flows, reactive services, and agentic workloads—systems that must adapt dynamically and operate reliably under heavy and unpredictable load. You’ll work with tools like  Kafka ,  Akka ,  stream processing frameworks , and other core distributed technologies, and collaborate across engineering teams to deliver infrastructure that is elastic, fault-tolerant, and observable by design. If you’re passionate about high-performance computing, resilient architecture, and enabling real-time intelligence at scale, this role is for you. Responsibilities Design and implement scalable, distributed platform components with technologies like  Kafka ,  Akka (Typed) ,  gRPC . Architect and optimize data pipelines capable of handling  billions of messages/events per day  with low latency and high reliability. Lead efforts in  agentic scaling  – dynamically spawning, routing, and managing autonomous agents (services/functions) in response to workload or demand. Build resilient systems that  self-heal, auto-scale , and degrade gracefully under pressure. Define and implement metrics, tracing, and observability for end-to-end system behavior and performance. Collaborate closely with infrastructure, SRE, and product teams to ensure platform scalability aligns with growth and reliability goals. Drive root-cause analysis of performance bottlenecks and propose long-term architectural improvements. Participate in on-call rotations, architecture reviews, and deep technical design sessions. Qualifications
5+ years of experience building  distributed systems  in a high-throughput production environment. Deep expertise with  Kafka  (topics, partitions, consumers, tuning, schema registry, stream processing). Strong experience with  Akka  or other actor-based concurrency models; familiarity with Akka Cluster, Sharding, Persistence, or Typed API. Solid programming skills in  Java  . Understanding of  agentic workloads  and dynamic system orchestration (, microservices that represent intelligent agents). Experience designing  scalable APIs , message protocols (, Protobuf, Avro), and event-driven architectures. Familiarity with  cloud-native environments  (, Kubernetes, service mesh, container orchestration). Preferred   Qualifications Experience with  serverless compute models  or  function-as-a-service  scaling paradigms. Contributions to open-source projects in the distributed systems ecosystem. Experience with  AI or ML-driven orchestration  or  agentic frameworks . Familiarity with  operational tooling : Prometheus, Grafana, OpenTelemetry, Kafka monitoring tools, etc.
Location:
San Francisco

We found some similar jobs based on your search