Software Engineer, Observability Platform (SMTS, LMTS)

New Yesterday

This position is located in either Bellevue or San Francisco office. Join the team responsible for innovating and maintaining the massive-scale, distributed systems that monitor Salesforces infrastructure. The Network Visibility and Telemetry team is responsible for designing, building, and operating a set of systems and services which deliver metrics, telemetry and alerting for data center infrastructure (network, storage, etc). We are part of the Infrastructure Strategy Datacenter Operations organization, which is a dynamic, global team delivering and supporting technology infrastructure to meet the substantial growth needs of the business. In this role, you will leverage your experience in building and deploying large-scale systems to automate systems services across all types of infrastructure (storage, network, server), enable the collection of infrastructure telemetry, make the infrastructure visible and accessible, and ensure that alerts are generated where action is needed. Required Skills: A related technical degree required.
Proven experience with supporting a codebase for distributed services implemented in Java and/or Python 8+ years of experience for Lead
5+ years of experience for Senior
Experience with automation of systems services and processes.
Excellent analytical and problem-solving skills
A long-standing practice of using Source Control (e.g. git) and unit testing
Experience in publishing and consuming REST APIs
CI/CD experience with Jenkins
Knowledge of Linux (RedHat) including configuration, packages, services, daemons, shells, and troubleshooting
Experience with configuration automation tools such as Ansible, Puppet, and/or Chef.
Experience in fast-paced, technical environments experiencing rapid growth and change
Ability to adapt, to be flexible, and to learn quickly in a dynamic environment
Excellent organizational skills including ability to prioritize tasks efficiently with high level of attention to detail
Ability to work under tight deadlines while coordinating several projects at a time and responding to changing business and technical conditions
Desired skills: Development experience in Clojure.
Experience with the monitoring and alerting of network infrastructure - routers, switches, load balancers, etc. - in a high-availability, always-on datacenter environment
Experience with the monitoring and alerting of storage infrastructure - switches, arrays, etc - in a high-availability, always-on data center environment
Experience with container orchestration systems, i.e., Docker and Kubernetes
Experience with Terraform, Helm, and Spinnaker.
Strong Network Engineering Skills: SNMP, BGP, OSPF or ISIS, LAN switching technologies, backbone, load balancers, IPv4/IPv6 addressing and subnetting.
Experience with application protocols and troubleshooting for the same (i.e., HTTP, HTTPS, TCP/UDP)
Experience with application databases and document stores, e.g. Elasticsearch, Cassandra
Experience in writing systems automation in a high level language such as python.
Previous experience as Scrum Master or Product Owner on an Agile Dev Team is a nice to have, especially if you enjoyed it.
#J-18808-Ljbffr
Location:
Bellevue, WA, United States
Category:
Computer And Mathematical Occupations

We found some similar jobs based on your search