An Automated Approach to Continuous Acceptance Testing of HPC Systems at NERSC
DescriptionWe demonstrate a continuous acceptance testing strategy used at NERSC that can be implemented in the broader HPC community. To accomplish this task, we designed a new framework that can handle the complex parts of HPC systems, allowing us to verify a system is working optimally. buildtest [1] is an acceptance testing framework that can automate the testing of HPC systems and enable HPC support teams to painlessly create and run tests. Testing is initiated by changes to the system/software stack at scheduled system outage that demands for NERSC staff to build, run and monitor test results using GitLab’s Continuous Integration (CI) [2]. Test results are clearly communicated to developers and users via the CDash [3] web interface and test failures are documented as github issues. Together this framework forms a robust method for verifying cutting edge software stacks’ function in challenging HPC environments.
Event Type
Workshop
TimeMonday, 14 November 20221:35pm - 1:55pm CST
LocationC141
W
Benchmarking
Cloud and Distributed Computing
Containers
Datacenter
Networks
Privacy
Resource Management and Scheduling
Security
SIGHPC
State of the Practice
System Administration
System Software
Recorded