Position Information
A Message from the University All University of Notre Dame faculty and staff are required to be vaccinated against COVID-19 and provide verification of full vaccination. Faculty and staff may apply for an exemption from the vaccination requirement for medical, religious, or other strongly held beliefs. Those granted an exemption must participate in weekly surveillance testing and continue to mask indoors while on campus. FAQ Job Title High Performance Computing (HPC) Engineer Job Description This position focuses on supporting complex research computing environments under the direction of the Center for Research Computing (CRC). To support such environments, the DevOps Engineer designs, builds, and maintains large-scale computing and storage infrastructure requiring in-depth expertise with Linux, cluster-based networking, grid-enabled middleware, common scientific applications, multi-platform network and system management tools, and distributed/parallel file systems. This position utilizes a wide variety of skills in order to focus on research computing users highly technical needs. When troubleshooting, this position identifies and resolves complex environment issues and effectively communicates status, progress and issues related to these environments.
This position has a comprehensive knowledge of HPC principles and practices, a solid knowledge of CRCs technologies and practices and is able to enforce policies, especially security and usage policies, pertaining to the environment.
It should be noted that this position might act as an engineer and do the work in the areas noted below or may act as a coordinator and enlist aid from department or college allies or within the CRC in order to produce a world class research computing environment.
Essential Duties Accountable for the system administration and integration of research computing environments. This includes building and maintaining HPC/HTC clusters at the hardware, operating system and VM/container levels. Also includes account maintenance and the configuration, usage statistics, testing and operation of the HPC/HTC batch system. Also includes maintenance and monitoring of the high performance parallel file systems in coordination with the CRC and External IT Operations/Engineering groups.
Accountable for the installation, maintenance, upgrading and troubleshooting hardware and software on platforms, which are supported in the research computing environment. This currently includes but is not limited to hardware from Lenovo, Dell and HP. Includes correct confirmation for optimized operation of the products currently installed.
Accountable for the installation, maintenance, and troubleshooting of grid-enabled middleware. This includes interfacing CRCs assets with common authentication/authorization framework to grid-middleware like Kubernetes, HTCondor, Open-Science Grid and/or Globus.
Write and maintain documentation on utilizing the research computing environment. Assist in the preparation and delivery in user short-courses on utilizing the HPC/HTC environment. Train and guide junior level team members.
Research current and future trends of both hardware and software. Keep current with trade publications, vendor documentation and new releases of books. Attend research computing conferences and/or trade shows like Supercomputing.
The CRC is a part of the Notre Dame Research (NDR) division.Notre Dame Research is committed to creating a community that fosters equity of experience and opportunity and ensures that members of all backgrounds feel safe, welcome, and included. We strive to achieve a culture of openness, autonomy, and belonging; making Notre Dame an exceptional place for our team, partners, and collaborators to flourish. Minimum Qualifications Bachelors degree plus five years of software programming experience OR Masters degree in related field will substitute for the two years of experience OR Ph.D. in a related field will substitute for three years of experience; a combination of a Masters degree plus a Ph.D. in a related field will substitute for the five years of experience
Knowledge and experience of one or more of the following, and be prepared and able to develop skills in other areas: common software development languages and tools; software design and architecture; parallel programming (primarily with MPI); grid computing, programming frameworks; numerical methods and algorithms; software debugging, profiling and optimization in an HPC environment; scientific visualization.
Excellent oral and written communication skills.
The ability to pickup and learn general concepts and technologies quickly and independently. Preferred Qualifications Good understanding of HPC/HTC environment including data visualization, schedulers and optimizing and parallelizing compilers (Python, C++, C, Fortran). Good understanding and knowledge of complex networked environments, single and multi-user computing environments and how those systems might be used to interact with the research computing environment both here at Notre Dame and at national supercomputing centers internationally. A good working knowledge of distributed file systems (AFS, Ceph, etc) is required. Experience with parallel file systems like Lustre, GPFS, or Panassas would be a plus. Special Instructions to Applicants Department Center for Research Computing (29055) Department Website crc.nd.edu Family / Sub-Family IT / HPC Career Stream/Level EIC 2 Professional Department Hiring Pay Range Pay ID Semi-Monthly FLSA Status S1 - FT Exempt Job Category Information Technology Job Type Full-time Schedule: Days of Week & Hours Monday Friday Schedule: Hours/Week 40 Schedule: # of months 12