HPC Storage Infrastructure Engineer
Lawrence Berkeley National Laboratory
Berkeley, CA 94720
DescriptionHPC Storage Infrastructure Engineer - 97183
Lawrence Berkeley National Lab’s (LBNL, https://www.lbl.gov/) NERSC Division has an opening for an HPC Storage Infrastructure Engineer to join the team.
In this exciting role, you will join the Storage Systems Group which is made up of system engineers and programmers providing NERSC’s 300 petabyte High Performance Storage System and 100+ petabyte center-wide, parallel file systems. Our storage systems are utilized by more than 8,000 scientists who use NERSC to perform unclassified, scientific research across a wide range of disciplines, including climate modeling, research into new materials, simulations of the early universe, high energy physics and a host of other scientific endeavors. You will participate in regular cross-team efforts to integrate our storage systems with NERSC’s computational and networking infrastructure, troubleshoot performance issues at scale, and develop innovative solutions to continuously optimize operational and user productivity. The HPC Storage Infrastructure Engineer will also work with peers at other leading HPC facilities and vendor engineering teams to evaluate emerging storage technologies and define future directions for deployment.
What You Will Do:
• Monitor, administer, and optimize NERSC’s distributed parallel file systems, block storage arrays, and auxiliary Linux-based storage servers.
• Analyze, troubleshoot, and resolve complex problems that arise in NERSC's production storage hardware, software systems, storage networks and systems that utilize NERSC storage systems.
• Assist with architecting and evaluating storage systems and technologies based on analysis of user requirements, storage industry trends, and system monitoring and telemetry.
• Participate in the planning and execution of cross-team maintenance activities, upgrades, and deployments at scale.
• Provide off-hours emergency support in a shared, on-call rotation for a subset of NERSC storage systems.
Additional Responsibilities as needed:
• Prepare timely documentation, papers, and presentations describing best practices and experiences at scale for dissemination within NERSC and throughout the broader HPC community.
• Assess emerging technologies in architecture, device technology, and high-performance I/O APIs to provide input for HPC system procurements and DOE technology roadmaps.
• Proactively seek opportunities to collaborate with researchers, operators, and vendors across the global HPC community to apply the best ideas and solutions to solving NERSC's technical challenges.
Want to learn more about Berkeley Lab's Culture, Benefits and answers to FAQs?
Please visit: https://recruiting.lbl.gov/
• This is a full-time, career appointment, exempt (monthly paid) from overtime pay.
• This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
• This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
• Work may be performed on-site, hybrid, full-time telework or remote modes. Work must be performed within the United States.
How To Apply
Apply directly online at http://18.104.22.168/counter.php?id=242933 and follow the on-line instructions to complete the application process.
Based on University of California Policy - SARS-CoV-2 (COVID-19) Vaccination Program and U.S Federal Government requirements, Berkeley Lab requires that all members of our community obtain the COVID-19 vaccine as soon as they are eligible. As a condition of employment at Berkeley Lab, all Covered Individuals must Participate in the COVID-19 Vaccination Program by providing proof of Full Vaccination or submitting a request for Exception or Deferral. Visit covid.lbl.gov (https://covid.lbl.gov/) for more information.
Berkeley Lab is committed to Inclusion, Diversity, Equity and Accountability (IDEA, https://diversity.lbl.gov/ideaberkeleylab/) and strives to continue building community with these shared values and commitments. Berkeley Lab is an Equal Opportunity and Affirmative Action Employer. We heartily welcome applications from women, minorities, veterans, and all who would contribute to the Lab's mission of leading scientific discovery, inclusion, and professionalism. In support of our diverse global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status.
Equal Opportunity and IDEA Information Links:
Know your rights, click here (https://www.dol.gov/agencies/ofccp/posters) for the supplement: Equal Employment Opportunity is the Law and the Pay Transparency Nondiscrimination Provision (https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf) under 41 CFR 60-1.4.
RequirementsWhat is Required:
• Bachelor’s degree and a minimum of eight years of related experience; or six years and a Master’s degree; or equivalent experience.
• Experience using one or more interpreted programming or scripting languages such as Python and Bash to automate system management tasks.
• Experience with setup and administration of one or more HPC storage system technologies (e.g., Lustre, Spectrum Scale, HPSS).
• Working knowledge of parallel storage technologies such as distributed storage systems, parallel file systems, object stores, hierarchical storage management, storage networking, and/or relevant hardware technologies.
• Strong written and verbal communication skills and the ability to document and describe complex tasks to audiences of varying familiarity with storage technologies.
• Ability to work effectively and collaboratively on a team, as well as give and receive constructive feedback to foster communication and trust.
• Strong sense of intellectual curiosity, self-direction, and desire to pursue challenging problems and understand complex systems.
• Bachelor’s degree and a minimum of twelve years of computing or storage experience; or eight years and a Master’s degree; or equivalent experience.
• Demonstrated contributions to the high-performance storage community (e.g., conference presentations, open source software).
• Experience leading technical projects in a highly collaborative team environment.
• Strong understanding of Linux fundamentals including file systems, networking, and virtual memory management.
• Understanding of file system internals, prior work developing storage systems, or experience troubleshooting and optimizing parallel I/O.
• Strong organizational skills and ability to effectively manage priorities across many projects ranging from immediate problem resolution to long-term strategic planning.