Staff Software Engineer - Production Engineering

New Yesterday

RDQ426R178 At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers — and customer obsessed — we leap at every opportunity to tackle technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started.
As a production engineer with a backend focus, you will ensure stable and efficient operation of production environments of your service by proactively monitoring systems, automating routine tasks, optimizing performance, responding to incidents, and managing deployment pipelines. This implies, among others, to write software in Scala/Java and to work closely with other engineering teams to maintain high availability and ensure the integrity and security of live systems.
The impact you will have: Improved System Reliability and Availability : By proactively monitoring and resolving issues across distributed systems, you will significantly reduce downtime and improve SLAs, directly contributing to a more resilient production environment.
Enhanced Operational Efficiency : Through automation of routine operational tasks and deployment processes, you will streamline engineering workflows, reducing manual toil and accelerating release cycles across global infrastructure.
Performance Optimization at Scale : By identifying and addressing performance bottlenecks in backend services and infrastructure, you will improve resource utilization and system throughput, enabling cost-effective scaling across thousands of Kubernetes clusters and millions of VMs.
Strengthened System Security and Integrity : By embedding security best practices into the deployment and operational workflows, you will help ensure compliance and protect production environments against vulnerabilities and threats. What we look for: BS/MS/PhD in Computer Science, or a related field
10+ years of production level experience in one of: Java, Scala, C++, or similar language.
Comfortable working towards a multi-year vision with incremental deliverables.
Experience in architecting, deploying and operating large scale distributed systems with high availability, scalability and durability.
Experience in performance and cost optimization, disaster recovery mechanisms, incident management and troubleshooting.
Good knowledge of SQL and operational experience in distributed and single node database engines.
Experience with software security and systems that handle sensitive data.
Experience with cloud technologies, e.g. AWS, Azure, GCP, Docker, Kubernetes. Pay Range Transparency
Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page .
Zone 1 Pay Range$190,900—$253,750 USD
Location:
San Francisco