DevOps Engineer - Wa
New Yesterday
Do you have any Walmart formers for this devops role? Only Walmart formers will be considered. About the Team
Join a specialized Infrastructure Engineering team focused on deploying and
managing cloud-hosted TigerGraph clusters. This role spans the full cluster
lifecyclefrom provisioning and performance testing to observability and
operationsensuring optimal performance and reliability of TigerGraphs hosted
environments.
________________________________________
Key Responsibilities
Cluster Provisioning & Setup
Define and implement cluster sizing strategies based on workload and capacity
planning.
Lead deployment of TigerGraph clusters across supported cloud platforms.
Manage infrastructure cost approvals and budgeting.
Cloud Infrastructure & Operations
Provision cloud infrastructure components including compute, storage, and
networking.
Implement secure networking configurations and ensure alignment with security
policies.
Collaborate with architecture and domain teams to fulfill security and
deployment requirements.
Performance & Resiliency
Conduct benchmarking, load testing, and stress simulations to validate
readiness.
Apply best practices for scalable and fault-tolerant cluster configurations.
Observability & Operational Readiness
Set up monitoring, alerting, and dashboarding tools for real-time operational
visibility.
Develop and maintain runbooks, standard operating procedures (SOPs), and
incident response workflows.
Ongoing Cluster Management
Manage upgrades, scaling activities, and infrastructure right-sizing.
Optimize shard distribution and maintain balanced cluster performance.
Monitor and reduce cloud resource consumption for cost efficiency.
________________________________________
Required Skills & Experience
5+ years of experience in cloud infrastructure engineering (AWS, GCP, or
Azure)
Hands-on experience with distributed systems or graph databases (TigerGraph
preferred)
Expertise in infrastructure-as-code tools (Terraform, CloudFormation)
Experience with performance/load testing tools and frameworks
Proficient in observability tools (e.g., Prometheus, Grafana, Datadog)
Strong understanding of operational documentation, incident management, and
SOPs
Familiarity with Kubernetes and container orchestration (a plus)
________________________________________
Preferred Qualifications
Experience with performance testing tools like JMeter, Locust, or Gatling
Background in managing medium to large-scale data clusters, with a focus on
scalability and fault tolerance
Prior experience with graph databases, especially TigerGraph or Neo4j Secondary Skills - Nice to
Haves
- Location:
- Chantilly