SR Lead Software Engineer - High Performance Computing, San Francisco

SR Lead Software Engineer - High Performance Computing

New Today

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products. As a Senior Lead Software Engineer at JPMorgan Chase within the AI Infrastructure team , you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. Drive significant business impact through your capabilities and contributions, and apply deep technical expertise and problem-solving methodologies to tackle a diverse array of challenges that span multiple technologies and applications. You will lead virtual and direct teams of developers, teaching them best practices in high-performance computing (HPC) practices that intersect with AI/ML. Thus, you are collaborative—especially since you will work closely with cross-functional teams comprised of data scientists, business analysts and other engineers. You will infuse the JPMorgan developer community with an appreciation of the impact that HPC can have by delivering software that consistently outperforms other platforms. You will deliver a variety of options to serve our various business needs--sometimes driven by low-latency; other times driven by throughput or low power.

Job responsibilities Regularly provides technical guidance and direction to support the business and its technical teams, contractors, and vendors Develops secure and high-quality production code, and reviews and debugs code written by others Drives decisions that influence the product design, application functionality, and technical operations and processes Serves as a function-wide subject matter expert in one or more areas of focus Actively contributes to the engineering community as an advocate of firmwide frameworks, tools, and practices of the Software Development Life Cycle Influences peers and project decision-makers to consider the use and application of leading-edge technologies Adds to the team culture of diversity, equity, inclusion, and respect Build scalable and efficient inferencing and training pipelines using HPC software techniques and patterns Working closely with business and data science teams, develop easy-to-use systems that serve their needs Using telemetry, create measurable frameworks for deciding amongst hardware and software options Publish and support re-usable patterns to optimize training and inference of ML models on various architectures Support developer community in learning lessons from high-performance computing (HPC) domain Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 5+ years applied experience Hands-on practical experience delivering system design, application development, testing, and operational stability Advanced in one or more programming language(s) Advanced knowledge of software applications and technical processes with considerable in-depth knowledge in one or more technical disciplines (., cloud, artificial intelligence, machine learning, mobile, Ability to tackle design and functionality problems independently with little to no oversight Practical cloud native experience Experience in Computer Science, Computer Engineering, Mathematics, or a related technical field Advanced understanding of High-Performance Computing system architectures and network topologies Expertise in at least one accelerator type (., GPU, FPGA) and experience mapping LLMs onto these accelerators Proficiency parallel programming and performance analysis of accelerator-based systems Familiarity with HPC software (., NCCL, MPI) and resource schedulers (., Kubernetes, SLURM) Preferred qualifications, capabilities, and skills Strong programming skills in Python, scripting, C, C++ with experience in AI/ML frameworks like PyTorch and LangChain Master’s Degree in Computer Science (required) 8+ years of experience in high-performance computing software 5+ years of experience with accelerators and deep learning, particularly large language models Experience in large organizations and regulated industries is a plus Excellent communication skills and the ability to work collaboratively in a dynamic team environment

Apply

Location:: San Francisco