SRE Engineer

New Yesterday

: 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals Deep understanding of Ray and KubeRay Troubleshooting Ray – team uses Ray as a service Experience with containerization and orchestration technologies, such as Docker and Kubernetes. Experience in delivering data and machine learning infrastructure in production environments Experience configuring, deploying and troubleshooting large scale production environments Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use Experience with alerting, monitoring and remediation automation in a large scale distributed environment Extensive programming experience in Java, Python or Go Strong collaboration and communication (verbal and written) skills , , or in Computer Science, Computer Engineering, or equivalent practical experience Experience with ML Training/Inference profiling and optimization
Location:
Sunnyvale

We found some similar jobs based on your search