Site Reliability Engineer (SRE) - SQL & NoSQL Database Operations
New Yesterday
Salary:
Summary
This role bridges software development and IT operations, focusing specifically on ensuring the reliability, scalability, and performance of our SQL and NoSQL database infrastructure. The SRE applies software engineering principles to automate database operations, build robust systems, and proactively address issues to minimize downtime and optimize resource utilization.
Responsibilities
Design and Implementation: Design, implement, and maintain scalable and reliable SQL and NoSQL database systems to support high-performance applications.
Automation and Optimization: Develop automation using scripting languages (e.g., Python) and configuration management tools (e.g., Ansible, Terraform) to streamline database operations, deployments, and infrastructure management.
Performance Tuning: Monitor, analyze, and optimize SQL queries and NoSQL database configurations (e.g., indexing, sharding, replication) to improve database performance and scalability.
Monitoring and Alerting: Design, build, and maintain comprehensive database monitoring solutions to track key metrics (e.g., availability, latency, errors, saturation) and establish effective alerts.
Incident Response: Participate in on-call rotations to respond to and resolve complex production issues and database-related incidents, conduct root cause analyses, and implement preventative measures.
Infrastructure Modernization: Drive the modernization of existing database infrastructure through migrations, upgrades, and optimization efforts.
Collaboration: Work closely with software developers, data engineers, DevOps teams, and architects to integrate database solutions into applications and optimize database performance and stability.
Capacity Planning: Plan and manage database infrastructure capacity to support increasing data volumes and user traffic.
Security: Implement database security policies, access controls, and encryption techniques to protect sensitive data.
Backup and Recovery: Design and implement robust backup and recovery strategies to ensure data integrity and minimize downtime in case of failures.
Skills & Requirements
Database Expertise: Deep understanding and hands-on experience with both SQL (e.g., SQL Server, MySQL) and NoSQL databases (e.g., MongoDB, Redis).
Software Engineering: Proficiency in programming languages like Python , c#, ReactJS, NodeJS etc and experience with software development best practices.
Automation and Scripting: Expertise in automation tools (e.g., Ansible, Terraform) and scripting languages (e.g., Python, Bash) for database operations and infrastructure management.
Cloud Platforms: Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
System Reliability Principles: Strong understanding of SRE principles and practices (e.g., SLIs, SLOs, error budgets, incident management).
Problem-Solving: Excellent diagnostic and problem-solving skills with the ability to analyze complex systems and troubleshoot issues under pressure.
Communication and Collaboration: Strong communication and collaboration skills to work effectively with cross-functional teams and stakeholders.
Data Modeling and Design: Proficiency in data modeling and schema design for both relational and NoSQL databases.
Performance Tuning: Ability to identify and resolve performance bottlenecks, optimize queries, and tune database configurations.
This SRE role requires a blend of technical depth, a proactive mindset, and a commitment to continuous improvement to ensure the smooth operation and optimal performance of critical database systems.
Qualifications
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
Relevant certifications in database technologies or cloud platforms (a plus).
- Location:
- Plano
- Category:
- Technology