SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather Archive

Training the CI Workforce for AI at Scale

Authors: Karen Tomko (Ohio Supercomputer Center), Dhabaleswar K. (DK) Panda (Ohio State University), Mary Thomas (University of California, San Diego (UCSD))

Abstract: The goal of this BoF is to provide a forum to discuss approaches and needs for training the CI workforce in developing, supporting and using large scale CI effectively for AI workloads. The organizers of this BoF have experience developing training activities ranging from conference tutorials, user workshops, summer institutes, hackathons, and bootcamp style training for AI and CI professionals. We’d like to share our experiences in offering such training and promote discussion in the SC community on AI training more generally. The discussion topics will range from learning outcomes, training delivery, experiential activities, to current gaps and future needs.

Long Description: Our goal is to have an open discussion with the community and gather feedback to guide the development of AI training materials and activities. Such training is essential to meet the growing demands for scalable AI model development, training and inference required to address scientific and societal challenges.

Our panelists will describe training activities that they have held for the HPC community. We have experience delivering training at conferences, user workshops, summer institutes, hackathon and bootcamps for AI and CI professionals. For each training experience we will describe the target learners, delivery and duration, the topics, learning activities and corresponding computing environment, assessments and feedback. We will discuss what has gone well and also discuss challenges such as barriers to learning due to the wide range of participant backgrounds as well as the computing software and systems required for exercises.

Some of the topics covered in the training activities that will be presented are: Python Tools and Jupyter Notebooks for Data Analysis, Typical Data Types (tables, images, time series, maps and text), Science Case studies, Fundamentals of Machine Learning: Bayesian Modeling & Neural Networks, Machine Learning and Deep Learning Frameworks, Parallel and Distributed DNN Training, Distributed Machine Learning Algorithms, Data Science using Dask and Spark,

Latest Trends in High-Performance Computing Architectures, Challenges in Exploiting HPC Technologies for DL/ML and Data Science, Experiential activities range from interactive notebook exercises to benchmarking parallel, Training and data science analysis on HPC systems.

We expect this BoF to be of relevant and of interest to a wide segment of SC attendees. The training programs that will be discussed target researchers, professionals and data science practitioners who develop, support or use HPC systems for AI workloads. Training activities of two current NSF funded Cybertraining grants and planned activities under the ICICLE NSF-AI Institute will be discussed.

We have the following expected outcomes: 1) Gain an understanding of community requirements and needs for AI training to improve training materials, methods, and approaches. 2) Learn what AI scientists need to know about developing models, software frameworks and stacks, and using HPC infrastructure for their workloads. 3) Identify opportunities for sharing learning materials. 4) Identify common prerequisite skills and existing learning opportunities/materials.


Back to Birds of a Feather Archive Listing