System Administrator - HPC and Data Analytics Testbeds
·
Sandia National Lab
·
New Mexico
DescriptionSandia's Advanced Architecture Platforms Administration Team develops, deploys, and manages testbed hardware in support of high performance computing (HPC) application research and development. The team manages ground breaking, one-off, and experimental hardware clusters, as well as standard systems that use unique software stacks. Members of the team work with local colleagues at Sandia, as well as collaborators at other Department of Energy laboratories and industry partners.

In this role, you will apply your existing knowledge of Linux systems on scientific or data analysis clusters to maintain and improve existing and future scientific computing testbed resources. You will work with all aspects of cluster management, from cabling new clusters, to compiling and testing kernel drivers for new hardware, to creating automation that keeps the testbeds running efficiently. This is an outstanding opportunity to partner with researchers and vendors to explore and influence technical directions and drive transformation in how scientific and national security applications use computing in the coming decade and beyond.

This position is primarily on-site at Sandia’s Albuquerque, New Mexico site, with some telecommuting permitted. Sandia provides generous relocation benefits for successful candidates. This job is posted at the level of Information Systems Architect (Experienced).

Every day will be different in our team, but typical activities include:

Collaborate with research and development staff, colleagues, and vendors to build and maintain testbed resources for testing novel networking, accelerators, scientific software workflows, and other technologies

Develop new operational methodologies and design of infrastructure to enable efficient operations of multiple, concurrent, emerging technology and prototype HPC Clusters

With the full testbeds team, maintain all system aspects of security, networks, filesystems, system software installation, and user support

Participate in all aspects of the HPC system lifecycle including facility integration, standup, acceptance testing, performance benchmarking, operational support, and reclamation.
RequirementsBachelor’s degree in Computer Science, Computer Engineering, Information Systems Engineering (CIS/MIS), or relevant STEM field plus five more years of relevant IT experience Minimum of 5 years’ experience managing Linux/Unix clusters dedicated to scientific computing, data analysis, or similar workflows Experience with one or more of the following: parallel filesystems, high speed networking, accelerators, application code optimization for HPC, HPC resource and job management Ability to acquire and maintain a DOE Q level clearance
Company DescriptionSandia National Laboratories is the nation’s premier science and engineering lab for national security and technology innovation, with teams of specialists focused on cutting-edge work in a broad array of areas. Some of the main reasons we love our jobs: Challenging work with amazing impact that contributes to security, peace, and freedom worldwide Extraordinary co-workers Some of the best tools, equipment, and research facilities in the world Career advancement and enrichment opportunities Flexible work arrangements for many positions include 9/80 (work 80 hours every two weeks, with every other Friday off) and 4/10 (work 4 ten-hour days each week) compressed workweeks, part-time work, and telecommuting (a mix of onsite work and working from home) Generous vacations, strong medical and other benefits, competitive 401k, learning opportunities, relocation assistance and amenities aimed at creating a solid work/life balance* World-changing technologies. Life-changing careers. Learn more about Sandia at: http://www.sandia.gov*These benefits vary by job classification.
·
·
Event Type
Job Posting
TimeWednesday, 16 November 202210am - 3pm CST
Location
Back To Top Button