HPC Systems Software Engineer
Lawrence Berkeley National Laboratory
Berkeley, CA 94720
DescriptionHPC Systems Software Engineer - 97205
Division: NE-NERSC

Lawrence Berkeley National Lab’s (LBNL, https://www.lbl.gov/) NERSC Division has an opening for an HPC Systems Software Engineer to join the team.

In this exciting role, you are a combination of software and systems development role with more classical systems administration, set in a cloud-oriented High Performance Computing Environment. You will work with a diverse team of subject-matter experts, operational staff, and vendor affiliates to assure operational success for the computational systems. You are a member of a team responsible for analysis and 24x7 on-call support of world class HPC computational systems, including integrated networking and storage, as well as developing novel capabilities to integrate into the system to further scientific productivity and usability. You will work on maintaining existing operational systems, associated test systems, integrating new systems, and contributing to the operational design and development of future systems. Further, you will interface and work closely with colleagues both within the organization in cross-functional teams, as well as with vendors.

What You Will Do:
• Design, deploy, and manage High Performance Computational systems.
• Provide 24x7 on-call support and analysis of operational HPC systems.
• Develop new software and maintain existing software (typically in C and Python, other languages from time-to-time) to manage the system or extend system capabilities.
• Participate in team-oriented agile development and management process for HPC systems.
• Work closely with Infrastructure, Operations, Networking, Security, Storage, and User Engagements groups to assure smooth operation of the system and user experience.

Additional Responsibilities as needed:
• Define and then lead new collaborative projects influencing new functionality or major vendor interactions.

Want to learn more about Berkeley Lab's Culture, Benefits and answers to FAQs?
Please visit: https://recruiting.lbl.gov/

• This is a full-time, career appointment, exempt (monthly paid) from overtime pay.
• This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
• This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
• Work will be primarily performed at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA.

How To Apply
Apply directly online at and follow the on-line instructions to complete the application process.

Based on University of California Policy - SARS-CoV-2 (COVID-19) Vaccination Program and U.S Federal Government requirements, Berkeley Lab requires that all members of our community obtain the COVID-19 vaccine as soon as they are eligible. As a condition of employment at Berkeley Lab, all Covered Individuals must Participate in the COVID-19 Vaccination Program by providing proof of Full Vaccination or submitting a request for Exception or Deferral. Visit covid.lbl.gov (https://covid.lbl.gov/) for more information.

Berkeley Lab is committed to Inclusion, Diversity, Equity and Accountability (IDEA, https://diversity.lbl.gov/ideaberkeleylab/) and strives to continue building community with these shared values and commitments. Berkeley Lab is an Equal Opportunity and Affirmative Action Employer. We heartily welcome applications from women, minorities, veterans, and all who would contribute to the Lab's mission of leading scientific discovery, inclusion, and professionalism. In support of our diverse global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status.

Equal Opportunity and IDEA Information Links:
Know your rights, click here (https://www.dol.gov/agencies/ofccp/posters) for the supplement: Equal Employment Opportunity is the Law and the Pay Transparency Nondiscrimination Provision (https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf) under 41 CFR 60-1.4.
RequirementsWhat is Required: • Typically requires a minimum of 8 years of related experience with a Bachelor’s degree in Computer Science or STEM-related field; or 6 years and a Master’s degree; or equivalent experience. • Minimum of 2 years of experience with systems programming in linux environment or management of arge-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scale environment. • Experience with C, bourne shell, and Python3 programming languages. • Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment. • Excellent oral and written communication skills. • Ability to derive technical solutions in a collaborative environment to meet end user requirements or needs. Desired Qualifications: • A minimum of 12 years linux experience with a minimum of 6 years of experience with the management of large-scale linux based systems in a HPC or Cloud environment, or have a strong community presence for developing one of the key technologies in use on NERSC systems. • Demonstrated excellent systems programming skills and strong knowledge of Linux internals. • Demonstrated ability to successfully lead complex projects. • Development of kubernetes microservices using technologies like helm or loftsman for deployment. • Operations of kubernetes, etcd. • Infrastructure as code solutions like argo, terraform, ansible, puppet, salt. • Rust or Go programming language. • Gitlab or Github Continuous Integration and Project Management. • Agile process, scrum. • Linux kernel interfaces, cgroups, ebpf. • Installation, configuration, monitoring, and tuning of workload management systems such as Slurm, PBSPro, or GridEngine. • Monitoring solutions such grafana, prometheus, ldms. • HPC systems administration. • HPC applications analysis, MPI. • Specialized networking (infiniband, slingshot, high-speed networks). • Lustre, SpectrumScale (GPFS) or other parallel file systems.
Event Type
Job Posting
TimeWednesday, 16 November 202210am - 3pm CST
Back To Top Button