DevOps Engineer - Wa

New Yesterday

Do you have any Walmart formers for this devops role? Only Walmart formers will be considered. About the Team
Join a specialized Infrastructure Engineering team focused on deploying and managing cloud-hosted TigerGraph clusters. This role spans the full cluster lifecyclefrom provisioning and performance testing to observability and operationsensuring optimal performance and reliability of TigerGraphs hosted environments.
________________________________________
Key Responsibilities
Cluster Provisioning & Setup Define and implement cluster sizing strategies based on workload and capacity planning. Lead deployment of TigerGraph clusters across supported cloud platforms. Manage infrastructure cost approvals and budgeting.
Cloud Infrastructure & Operations Provision cloud infrastructure components including compute, storage, and networking. Implement secure networking configurations and ensure alignment with security policies. Collaborate with architecture and domain teams to fulfill security and deployment requirements.
Performance & Resiliency Conduct benchmarking, load testing, and stress simulations to validate readiness. Apply best practices for scalable and fault-tolerant cluster configurations.
Observability & Operational Readiness Set up monitoring, alerting, and dashboarding tools for real-time operational visibility. Develop and maintain runbooks, standard operating procedures (SOPs), and incident response workflows.
Ongoing Cluster Management Manage upgrades, scaling activities, and infrastructure right-sizing. Optimize shard distribution and maintain balanced cluster performance. Monitor and reduce cloud resource consumption for cost efficiency.
________________________________________
Required Skills & Experience 5+ years of experience in cloud infrastructure engineering (AWS, GCP, or Azure) Hands-on experience with distributed systems or graph databases (TigerGraph preferred) Expertise in infrastructure-as-code tools (Terraform, CloudFormation) Experience with performance/load testing tools and frameworks Proficient in observability tools (e.g., Prometheus, Grafana, Datadog) Strong understanding of operational documentation, incident management, and SOPs Familiarity with Kubernetes and container orchestration (a plus)
________________________________________
Preferred Qualifications Experience with performance testing tools like JMeter, Locust, or Gatling Background in managing medium to large-scale data clusters, with a focus on scalability and fault tolerance Prior experience with graph databases, especially TigerGraph or Neo4j Secondary Skills - Nice to Haves
Location:
Chantilly