Senior Machine Learning Engineer
1 Days Old
Founded in 2012, H2O.ai is on a mission to democratize AI. As the world’s leading agentic AI company, H2O.ai converges Generative and Predictive AI to help enterprises and public sector agencies develop purpose-built GenAI applications on their private data. Its open-source technology is trusted by over 20,000 organizations worldwide - including more than half of the Fortune 500 - H2O.ai powers AI transformation for companies like AT&T, Commonwealth Bank of Australia, Singtel, Chipotle, Workday, Progressive Insurance, and NIH.
H2O.ai partners include Dell Technologies, Deloitte, Ernst & Young (EY), NVIDIA, Snowflake, AWS, Google Cloud Platform (GCP) and VAST. H2O.ai’s AI for Good program supports nonprofit groups, foundations, and communities in advancing education, healthcare, and environmental conservation. With a vibrant community of 2 million data scientists worldwide, H2O.ai aims to co-create valuable AI applications for all users.
H2O.ai has raised $256 million from investors, including Commonwealth Bank, NVIDIA, Goldman Sachs, Wells Fargo, Capital One, Nexus Ventures and New York Life.
About This Opportunity
We are seeking a Senior Machine Learning Engineer with exceptional technical expertise in deploying, scaling, and maintaining production ML systems. This role requires a strong combination of software engineering skills, ML/AI knowledge, and system architecture experience to build robust, scalable machine learning infrastructure. The ideal candidate will have experience with end-to-end ML pipelines, modern MLOps practices, and the ability to bridge research and production environments.
What You Will Do
ML System Architecture & Development
Design and implement end-to-end machine learning pipelines from research to production
Build scalable ML infrastructure supporting multiple models and high-throughput inference
Develop automated systems for model training, validation, deployment, and monitoring
Create efficient data processing pipelines with multiprocessing optimization and performance tuning
Architect feature stores, model registries, and ML metadata management systems
Production ML Operations
Deploy and maintain production ML models with focus on reliability, scalability, and performance Implement MLOps best practices including CI/CD for ML, automated testing, and model versioning
Monitor model performance, data drift, and system health in production environments
Optimize model inference for latency and throughput requirements
Manage model lifecycle including retraining, rollback, and A/B testing strategies
Advanced ML Implementation
Implement cutting-edge ML techniques including generative AI, diffusion models, and large
language models
Develop and optimize deep learning models using modern frameworks (TensorFlow,
PyTorch)Build systems for handling multimodal data (text, images, video, time-series)
Create solutions for challenging ML problems including out-of-distribution detection and feature alignment
Implement efficient algorithms achieving significant performance improvements (orders of
magnitude speedups)
Technical Leadership & Collaboration
Lead technical design reviews and architecture decisions for ML systems
Mentor junior engineers and data scientists on ML engineering best practices
Collaborate with research teams to transition experimental models to production
Work with infrastructure teams to ensure optimal resource utilization and scaling
Provide technical guidance on complex ML system design and implementation
What We Are Looking For
Education & Experience
Master's degree in Computer Science, Engineering, Physics, Mathematics, or related
technical field
7+ years of experience in machine learning engineering, software development, or related roles
5+ years of experience building and deploying production ML systems
Proven track record of leading technical projects and mentoring team members
Core Programming & ML
Expert-level proficiency in Python with strong knowledge of Bash, SQL, C/C++
Deep experience with ML frameworks: TensorFlow, PyTorch, Scikit-learn
Extensive experience with data processing libraries: NumPy, Pandas, Matplotlib
Hands-on experience with Hugging Face ecosystem and modern NLP/LLM tools
ML Ops & Infrastructure
Strong experience with containerization and orchestration: Docker, Kubernetes
Knowledge of cloud platforms: AWS, GCP, Azure and their ML services
Experience with MLworkflow orchestration tools: Airflow, Kubeflow, MLflow
Proficiency in Infrastructure as Code: Terraform, CloudFormation
Experience with monitoring and observability tools: Prometheus, Grafana, ELK stack
Advanced ML Technologies
Proven expertise in generative AI including diffusion models, GANs, VAEs, and normalizing flows
Experience with large language models (LLMs) and agentic AI systems
Knowledge of advanced architectures: CNNs, U-Nets, transformers, and attention
mechanisms
Experience with model optimization techniques: quantization, pruning, distillation
Understanding of distributed training and inference systems
Software Engineering
Strong software development practices including version control, testing, and code review
Experience with micro services architecture and API development
Knowledge of database systems and data storage solutions
Understanding of distributed systems and concurrent programming
Experience with performance profiling and optimization
System Design & Architecture
Experience designing large-scale ML systems and data pipelines
Knowledge of real-time and batch processing architectures
Understanding of model serving patterns and inference optimization
Experience with auto-scaling and resource management in production environments
Knowledge of security best practices for ML systems
Problem-Solving & Innovation
Track record of solving complex technical problems with innovative engineering solutions
Experience working with real-world, noisy datasets across multiple domains
Ability to achieve significant performance improvements and system optimizations
Strong debugging and troubleshooting skills for production ML systems
Experience with A/B testing and experimentation frameworks
How to Stand Out From the Crowd
PhD in Computer Science, Engineering, Physics, Mathematics, or related quantitative field
Deep background in computational sciences (astrophysics, physics, computational biology)
Experience in technology companies with large-scale ML infrastructure
Knowledge of financial services, healthcare, or other regulated industries
Background in research environments with transition to production systems
Experience building and deploying LLM applications and chatbot systems
Background in computer vision and image processing applications
Knowledge of time-series analysis and forecasting systems
Experience with automated content generation and summarization systems
Understanding of federated learning and privacy-preserving ML techniques
Technical Specializations
Experience with edge deployment and model optimization for mobile/IoT devices
Knowledge of multi-cloud and hybrid cloud architectures
Background in streaming data processing and real-time ML systems
Experience with graph neural networks and knowledge graphs
Understanding of reinforcement learning and multi-agent systems
Leadership & Communication
Experience mentoring engineering teams and establishing technical standards
Strong project management skills with experience in Agile/Scrum methodologies
Ability to communicate complex technical concepts to diverse audiences
Experience with technical writing and documentation
Track record of driving technical innovation and process improvements
Success Metrics
System uptime and reliability of production ML services
Model performance and accuracy in production environments
Deployment velocity and time-to-production for new modelsResource utilization efficiency and cost optimization
Team productivity and knowledge sharing initiatives
Technical innovation and patent applications
Technical Environment
Access to cutting-edge ML infrastructure and computing resources
Opportunity to work with the latest ML frameworks and tools
Collaborative environment with research and product teams
Support for experimentation and technical innovation
Flexible architecture allowing for rapid prototyping and iteration
Why H2O.ai? Market leader in total rewards
Remote-friendly culture
Flexible working environment
Be part of a world-class team
Career growth
H2O.ai is committed to creating a diverse and inclusive culture. All qualified applicants will receive consideration for employment without regard to their race, ethnicity, religion, gender, sexual orientation, age, disability status or any other legally protected basis.
H2O.ai is an innovative AI cloud platform company, leading the mission to democratize AI for everyone. Thousands of organizations from all over the world have used our cutting-edge technology across a variety of industries. We’ve made it easy for people at all levels to generate breakthrough solutions to complex business problems and advance the discovery of new ideas and revenue streams. We push the boundaries of what is possible with artificial intelligence.
H2O.ai employs the world’s top Kaggle Grandmasters, the community of best-in-the-world machine learning practitioners and data scientists. A strong AI for Good ethos and responsible AI drive the company’s purpose.
Please visit www.H2O.ai to learn more.
#LI-DNI Powered by JazzHR
o6pnUxfjsZ
- Location:
- Austin
- Category:
- Engineering