Fevrok logo
AI Private Cloud Infrastructure Engineering Lead
3 years ago

We are looking for a highly motivated infrastructure systems and network engineer for internal cloud, k8s, HPC clusters to join our team in the fastest growing organization at NVIDIA. There is an excellent opportunity to architect and drive all levels of the on-prem and cloud MLOPS infrastructure for the next generation of Perception for the Autonomous Vehicles(AV) at NVIDIA! Please apply if you are passionate about Kubernetes networking and would love to dig into the sophisticated technical problems spanning across IP networks, VMs, Linux systems, HPC schedulers and k8s.

What you'll be doing:

  • As part of MagLev Infrastructure Team you will propose and craft new ways to improve availability of the largest GPU clusters at NVIDIA

  • The solutions you propose and build will directly impact the efficiency of the AV team!

  • You are in the driver's seat for a new networking and provisioning architecture for the multi-datacenter and multi-region MLOPS platform

  • Join our multi-functional infrastructure team responsible for architecting and engineering k8s infrastructure from the OS level and up

  • Be able to triage and resolve complicated systems level problems impacting GPU clusters and improve observability of such events


What we need to see:

  • BS or MS in the CS/CE/EE or equivalent experience

  • Minimum of 10 years of the k8s infrastructure engineering on-prem

  • Minimum of 4 years working with the large scale hardware provisioning technologies. Familiarity with end-to-end hardware management in datacenters.

  • Versatile with at least one programming languages like: Go, Python

  • Complete understanding of the Kubernetes, Containerization and Cloud Native Architecture

  • Deep knowledge of the L2/L3 networking and specifically k8s networking

  • Experience with Infrastructure as a Code concepts and has a track of successful large scale projects under the belt. Expertise with Ansible.

  • Proficiency with Linux environment

  • Proficiency with virtualization, either KVM, ESXi

  • Expertise at problem solving and complexity analysis of the distributed systems

  • Excellent written and verbal interpersonal skills


Ways to stand out from the crowd:

  • Experience implementing on-prem Load Balancing solutions.

  • Expertise with L2/L3 protocols, BGP, IP tables, IPVS

  • Previous experience with building sophisticated tooling and infrastructure automation on the large 100+ nodes GPU and CPU clusters

  • Experience with Infiniband or ROCE solutions

  • Familiar with all components of the provisioning systems and different bare-metal provisioning systems like Canonical MaaS, Foreman

  • Extensive experience across wide range of Observability solutions

  • Experience with HPC systems and schedulers, Slurm, LSF


For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the new AI computing era, ignited by a new computing model, GPU deep learning. This new model - where a deep neural network is trained to recognize patterns from extensive amounts of data - has shown to be deeply effective at solving some of the most daring problems in everyday life.

The Colorado Equal Pay for Equal Work Act requires that NVIDIA provide the compensation range and benefits offered for this position if performed in Colorado. The base salary range for this position in Colorado is $230,400.00 - 316,800.00 USD.
NVIDIA also offers a comprehensive benefits package. We provide health care coverage, dental and vision, 401(K), including company matching and after tax contributions, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave, vacation leave, professional time off, life and disability protection. Employees in eligible sales and positions may also be eligible for commission.

Base pay is based on market location and may vary based on factors including experience, skills, education, and other job-related reasons.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#deeplearning






©2025 Fevrok. All Rights Reserved.