Site Reliability Engineer
New Yesterday
Job Summary: We are seeking a Contract Site Reliability Engineer to support and enhance the reliability, availability, and performance of our infrastructure. The ideal candidate will collaborate with development and operations teams to build scalable systems using modern cloud technologies while ensuring cost-efficiency. This is a hybrid role based in Plano, TX, offering exposure to large-scale systems and modern DevOps/SRE practices.\n\nResponsibilities:\nAssist in designing and implementing scalable and reliable systems using Kubernetes, Docker, and Istio^Monitor system performance and respond to incidents using observability tools like Datadog^Identify and address performance and scalability improvements proactively^Create and maintain automation scripts for deployment and monitoring tasks^Apply GitOps practices for reliable and smooth production deployments using Argo CD^Collaborate with developers to resolve system reliability issues^Conduct load testing to ensure stability under expected workloads^Implement deployment strategies such as A/B testing, canary releases, and traffic mirroring^Use Helm charts for managing application deployments^Support and maintain AWS infrastructure, including EKS, Load Balancers, and routing^Ensure solutions are cost-effective, highly available, and customer-focused^Participate in on-call rotations and coordinate with global SRE teams^Contribute to internal documentation and share knowledge across the team^Support the adoption of SRE best practices across the organization
- Location:
- Plano, TX, United States