•   Sign In
  • Schedule
  • Program
    • ACM A.M. Turing Award Lecture
    • Awards
    • Birds of a Feather
    • Early Career
    • HPC Accelerates Plenary
    • Invited Talks
    • Opening Session
    • Panels
    • Papers
    • Posters
      • ACM Student Research Competition
      • Doctoral Showcase
      • Research Posters
      • Scientific Visualization & Data Analytics Showcase
    • Proceedings & Archives
    • Students@SC
      • Alumni Networking Event
      • Guided Interest Groups
      • HPC Immersion
      • IndySCC
      • Job Fair
      • Lead Student Volunteers
      • Mentor–Protégé Matching
      • Student Cluster Competition
      • Student Tours
      • Student Volunteers
        • Student Volunteers FAQ
      • Students@SC Webinars
      • Teach the Teacher
    • Tutorials
    • Workshops
  • Exhibits
    • Exhibit at SC
      • Exhibitor Application
      • Exhibitor Deadlines
      • Exhibitor Function Space
      • Exhibitor Housing
      • Exhibitor Manual
    • Exhibitor Forum
    • Exhibitor List & Floorplan
    • Promotional Opportunities
    • Recruit at the Job Fair
  • SCinet
    • All About SCinet
    • INDIS Workshop
    • Network Research Exhibition
    • Participate in SCinet
    • SCinet for Exhibitors
      • SC Network Policy
        • Network Requests
        • Network Use Policy
    • SCinet Teams
    • Women in IT Networking at SC
  • Attend
    • Attendee Deadlines & Hours
    • Code of Conduct
    • Convention Center
    • Coronavirus & SC
    • Dallas
    • Family Resources
    • First-Timers
      • Navigating SC
    • Housing
    • Inclusivity
      • Demographics
    • Job Fair
      • Job Postings
    • Media
      • Blog
      • Media Partners
      • Media Registration
      • Newsletter
      • Photos & Logos
    • Receptions
    • Registration
      • Digital Experience
      • Registration FAQ
      • Visa Applications
    • Schedule
      • Mobile App
  • Submit
    • Birds of a Feather Submissions
    • Conflict of Interest
    • Early Career Applications
    • Exhibitor Forum Submissions
    • In-Person Presentations
    • Panel Submissions
    • Network Research Exhibition Submissions
    • Paper Submissions
      • Papers FAQ
      • Double-Blind Review Policy
    • Poster Submissions
      • ACM SRC Submissions
      • Doctoral Showcase Submissions
      • Research Poster Submissions
      • SciViz Showcase Submissions
    • Reproducibility Initiative
      • AD/AE Appendix Process & Badges
    • Students@SC Submissions
    • Test of Time Award
    • Tutorial Submissions
    • Volunteer
      • Planning Committee Applications
    • Women in IT Networking at SC
    • Workshop Submissions
  •   Digital Experience
  • Schedule
  • Program
    program
    • ACM A.M. Turing Award Lecture
    • Awards
    • Birds of a Feather
    • Early Career
    • HPC Accelerates Plenary
    • Invited Talks
    • Opening Session
    • Panels
    • Papers
    • Posters
      • ACM Student Research Competition
      • Doctoral Showcase
      • Research Posters
      • Scientific Visualization & Data Analytics Showcase
    • Students@SC
      • Alumni Networking Event
      • Guided Interest Groups
      • HPC Immersion
      • IndySCC
      • Job Fair
      • Lead Student Volunteers
      • Mentor–Protégé Matching
      • Student Cluster Competition
      • Student Tours
      • Student Volunteers
        • Student Volunteers FAQ
      • Students@SC Webinars
      • Teach the Teacher
    • Tutorials
    • Workshops
    • Proceedings & Archives
  • Exhibits
    exhibits
    • Exhibit at SC
      • Exhibitor Application
      • Exhibitor Deadlines
      • Exhibitor Function Space
      • Exhibitor Housing
      • Exhibitor Manual
    • Exhibitor Forum
    • Exhibitor List & Floorplan
    • Promotional Opportunities
    • Recruit at the Job Fair
  • SCINET
    scinet
    • All About SCinet
    • INDIS Workshop
    • Network Research Exhibition
    • Participate in SCinet
      • SCinet Contributors & Volunteers
    • SCinet for Exhibitors
      • SC Network Policy
        • Network Requests
        • Network Use Policy
    • SCinet History
    • SCinet Teams
    • Women in IT Networking at SC
  • Attend
    attend
    • Attendee Deadlines & Hours
    • Code of Conduct
    • Convention Center
    • Coronavirus & SC
    • Dallas
    • Family Resources
    • First-Timers
      • Navigating SC
    • Housing
      • Shuttle Bus Map
    • Inclusivity
      • Demographics
    • Job Fair
      • Job Postings
    • Media
      • Blog
      • Media Partners
      • Media Registration
      • Newsletter
      • Photos & Logos
    • Registration
      • Registration FAQ
      • Visa Applications
    • Receptions
    • Schedule
      • Digital Experience
      • Digital Experience Support
      • Mobile App
  • Submit
    submit
    • All Dates & Deadlines
    • In-Person Presentations
    • Conflict of Interest
    • Birds of a Feather Submissions
    • Early Career Applications
    • Exhibitor Forum Submissions
    • Network Research Exhibition Submissions
    • Panel Submissions
      • Panels FAQ
    • Paper Submissions
      • Papers FAQ
      • Double-Blind Review Policy
    • Poster Submissions
      • ACM SRC Submissions
      • Doctoral Showcase Submissions
      • Research Poster Submissions
      • SciViz Showcase Submissions
        • Posters FAQ
    • Reproducibility Initiative
      • AD/AE Appendix Process & Badges
    • Students@SC Submissions
    • Test of Time Award Nominations
    • Tutorial Submissions
      • Tutorials FAQ
    • Volunteer
      • Planning Committee Applications
    • Women in IT Networking at SC
    • Workshop Submissions
      • Workshops FAQ
Register
Home Search Program

Search Program

  • Full Schedule
  • Contributors
  • Organizations
  • Posters
  • Search
  • Support
Full Program · Contributors · Organizations · Search Program

Organizations

24/7 Consulting
2CRSi
A*STAR Computational Resource Centre, Singapore
A*STAR NSCC Singapore
Aalto University, Finland
Aanchal Apparel and Accessories
Academic Computer Centre Cyfronet AGH, Krakow, Poland
Ace Computers
Achronix Semiconductor Corporation
ACM SIGHPC Systems Professionals Virtual Chapter
ACROSS & SEA EuroHPC Projects
Adaptive Computing
ADMIN Enterprise HPC / Linux Magazine / ADMIN Network & Security
Adobe Research
Advanced Clustering Technologies
Advanced Computing Center for Research and Education
Advanced Computing Service for Latin America and the Caribbean (SCALAC)
Advanced Micro Devices (AMD) Inc
Advanced Micro Devices (AMD) Inc, India
AGH University of Science and Technology, Krakow, Poland
Agnostiq
Agnostiq Inc
AI Lab
AIC Inc.
AIST
Alabama Supercomputer Authority
Alces Flight Ltd, UK
Alces Software Ltd, UK
Aleph Alpha, Germany
Alibaba Cloud
Alibaba Group
Alibaba Inc
Allen Institute for Artificial Intelligence
Almalinux
Altair
AMAX
Amazon
Amazon Web Services
AMD
AMD Research
Ames National Laboratory
Amgen Inc
Andong National University, South Korea
ANSYS Inc
Apacer Memory America, Inc.
Applied Computing, National Research Council (IAC-CNR), Italy
Arcitecta
Argonne National Laboratory (ANL)
Argonne National Laboratory (ANL), Data Science and Learning Division
Argosy Research Inc.
Arista Networks
Arizona Research Computing
Arizona State University
ARM Ltd
ASA Computers
Asia Supercomputer Community
Asian Technology Information Program
Aspen Systems Inc
ASRock Rack Inc.
Astera Labs
Astera Labs Inc
ASUS
ATEMPO
Atipa Technologies
atNorth
Atomic Energy and Alternative Energies Commission (CEA)
Atos
Auras Technology Co., Ltd.
Australian National University
Autonomous University of Barcelona, Spain
AWS
Ayar Labs
Azure Systems Research
Baidu Security
Barcelona Supercomputing Center (BSC)
Barkhausen Institute
Baylor University
BeeGFS, ThinkParQ
BEEHE Electric - Thermal Management
Beihang University
Beijing Information Science and Technology University
Beijing University of Posts and Telecommunications
Belmont University
Ben-Gurion University of the Negev, Ben-Gurion University of the Negev, Israel
Ben-Gurion University of the Negev, Israel
Bergen Language Design Laboratory, University of Bergen
Berlin Institute of Technology
Biomedical Sciences Research Center (BSRC), Greece
BIOS-IT
BioTeam, Inc.
BittWare
Boeing
Boise State University
Bordeaux INP
Boston Limited
Boston University
Boyd
Brigham Young University
Brookhaven National Laboratory
Budapest University of Technology and Economics
Bull Atos Technologies
ByteDance Ltd
California Institute of Technology
California State University, Channel Islands
California State University, San Francisco
Calvin University
Carahsoft
Cardiff University, Wales
Carnegie Mellon University
Case Western Reserve University
CCS/JCAHPC, University of Tsukuba
CEA
CEJN North America
Celestica
Center for Development In Advanced Computing
Center for High Performance Computing (CHPC), South Africa
Center for Translational Data Science - Open Commons Consortium
Centrum Wiskunde and Informatica (CWI), Netherlands
Cerebras Systems
CEV Electronic GmbH
Champlain College
Chelsio Communications
Chemnitz University of Technology
Chenbro Micom USA Inc.
Chilldyne, Inc.
China University of Geosciences
Chinese Academy of Engineering
Chinese Academy of Sciences
Chinese University of Hong Kong (CUHK)
Chinese University of Hong Kong, Shenzhen
Ciena Corporation
CINECA
CIQ: Fuzzball/HPC-2.0, Rocky Linux, Apptainer/Singularity, Warewulf
Cisco Systems
City University of New York
Clemson University
Cloud Native Computing Foundation (CNCF)
Cloudian
CMC - Osaka University
Codeplay
Codeplay Software Ltd, UK
Colgate University
College of William & Mary
Colorado School of Mines
Colorado State University
ColorChip Group
Columbia University
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
Compute Express Link (CXL) Consortium
Computer Architecture and Parallel System Laboratory (CAPSL)
Computer Science
Concordia University, Texas
CoolIT Systems
Cooltera Limited
Cornelis Networks
Cornell University
Costa Rica National High Technology Center
CPC
Cranfield University, England
croit
Cruise
Crusoe Energy Systems
CSC - It Center for Science
CSC – IT Center for Science Ltd, Finland
CSCS / ETH Zurich
cunoFS
CXL Consortium
CyberCore Technologies
Cyberinfrastructure Integration Research Center
Danfoss Power Solutions
Data in Science Corporation
DataDirect Networks (DDN)
DE-CIX North America
Defense Advanced Research Projects Agency (DARPA)
Deggendorf Institute of Technology, Germany
Dell EMC
Dell Technologies Inc
Department of Computer Science
Department of Electrical and Computer Engineering
DePaul University
DGA Information Literacy, France
Digital Research Alliance of Canada
Discovery Partner Institute, University of Illinois Chicago
DMTF
dNOC
Do IT Now
DoD High Performance Computing Modernization Program
DOD HPCMP
DOE Office of Advanced Scientific Computing Research
Dolphin Interconnect Solutions AS
DUG Technology
Duke University
Durham University, England
Dynatron Corporation
Ecosystem for Research Networking
EDF Research and Development
Edinburgh Parallel Computing Centre (EPCC)
EGI Foundation, Netherlands
Eindhoven University of Technology, Netherlands
Elastic
Electronics and Telecommunications Research Institute, South Korea
Elizabeth City State University
Emenda USA
Emory University
Energy Efficient HPC Working Group
Energy Sciences Network (ESnet)
ENS Lyon
EnterpriseDB
EPCC at the University of Edinburgh
Equifax Inc
Erlangen National High Performance Computing Center
ETH Zürich
ETH Zurich – Swiss Federal Institute of Technology
eTopus Technology Inc.
European Centre for Medium-Range Weather Forecasts (ECMWF)
European Open File System Association (EOFS)
European Organization for Nuclear Research (CERN)
European Processor Initiative
European Technology Platform for High-Performance Computing (ETP4HPC)
Exotanium
Facebook
Facebook AI Research (FAIR)
Federal University of Rio Grande do Sul, Federal University of Rio Grande do Norte, Brazil
Federal University of Sao Carlos, Brazil
Fermi National Accelerator Laboratory
FIBERSTAMP
Flex Logix Inc.
Flinders University, Australia
Fluminense Federal University, Brazil
Forschungszentrum Jülich
Fraunhofer Institute for Electronic Nano Systems
Fraunhofer Institute for Industrial Mathematics
Frederick National Laboratory for Cancer Research
Free University of Berlin
Free University of Bozen-Bolzano, Italy
French Alternative Energies and Atomic Energy Commission (CEA)
French Institute for Research in Computer Science and Automation (INRIA)
French Institute for Research in Computer Science and Automation (INRIA) - Bordeaux
Friedrich Schiller University Jena, Germany
Friedrich-Alexander University, Erlangen-Nuremberg
FUJIFILM Data Management Solutions
Fujitsu Ltd
FuriosaAI, Inc
GAMS Development Corp.
GDIT
GE Aviation
GEM Southwest
Gem State Informatics Inc
GENCI, France
General Atomics
General Dynamics Information Technology Inc
George Mason University (GMU)
George Washington University
Georgia Institute of Technology
German Aerospace Center (DLR)
German Climate Computing Centre (DKRZ)
GIGABYTE
GigaIO
Gironella Ferrer | Global Technia
Globus
Glyndwr University, Wales
Goethe University Frankfurt
Goldman Sachs Inc
Google Cloud
Google LLC
GRAID Technology Inc.
Graphcore
Great Plains Network
Grenoble Alpes University, France
GRI Pumps
Groq Inc
Guest
GWDG, Germany
H3 Platform Inc
Habana Labs
Hammerspace
Hanover College
Harbin Institute of Technology, China
Harrisburg University of Science and Technology
Hartree Centre
Hartree Centre, Science and Technology Facilities Council (STFC), UK
Harvard University
HDF Group
Heidelberg Institute of Theoretical Studies
Heidelberg University
Helmholtz AI
Hewlett Packard Enterprise (HPE)
Hewlett Packard Enterprise (HPE) Networks
Hewlett Packard Labs
High Performance Cloud Computing (HPCC) Lab
High Performance Computing Center (HLRS), Stuttgart
Hitachi Ltd, Japan
HLRS
Hokkaido University
Hong Kong University of Science and Technology
Hong Kong University of Science and Technology, Guangzhou
HPC Advisory Council
HPC AI TECHNOLOGY LLC
HPC Engineer
HPC Modernization Program (HPCMP)
HPC Modernization Program (HPCMP) User Productivity Enhancement and Training (PET)
HPCwire
HPE HPC/AI EMEA Research Lab (ERL), Switzerland
HQS QuantumSimulations GmbH
Huawei Technologies Ltd
Huawei Technologies Switzerland
Huazhong University of Science and Technology (HUST)
I-PEX
IBM
IBM Corporation
IBM Research
IBM Research Europe
IBM Research, Almaden
IBM Research, UK
IBM TJ Watson Research Center
IBM Zurich Research Laboratory
Icahn School of Medicine at Mount Sinai
Idaho National Laboratory
IDC
IEEE Computer Society
IEEE Quantum Week
Illinois Institute of Technology
ILNumerics
Imperial College, London
Independent Board Director
Indian Institute of Technology (IIT), Hyderabad
Indian Institute of Technology (IIT), Kanpur
Indian Institute of Technology, Bhilai
Indiana University
Indiana University, Center for Applied Cybersecurity Research (CACR)
Industrial University of Santander, Colombia
INESC-ID, Portugal
InfiniteTactics
INFN CNAF
Information Sciences Institute
Ingrasys Technology Inc.
innodisk
Innovations, Macedonia
INRIA Grenoble
Inspire Semiconductor Inc
Inspur Electronic Information Industry Co.,Ltd
Institute of Computing Technology, Chinese Academy of Sciences
Institute of Industrial Science, The University of Tokyo
Institute of Informatics, University of Warsaw, Poland
Institute of Software, Chinese Academy of Sciences
Intel
Intel Corporation
Intel Labs
Intel Labs, India
Intelligence Advanced Research Projects Activity (IARPA)
Intelligent Light
Internet2
Intersect360 Research
Interuniversity Microelectronics Centre (IMEC), Belgium
IonQ
Iowa State University
Irish Centre for High‑End Computing
ISC Group
ISOCPP.org Foundation
Israel Atomic Energy Commission
IT University of Copenhagen
IT4Innovations, Czech Republic
ITBL Community
ITC/JCAHPC, The University of Tokyo
Izmir Institute of Technology, Turkey
Jabberwock Technologies Inc
Jackson Laboratory
JAIST
James Madison University
JAMSTEC
Japan Aerospace Exploration Agency (JAXA)
Japan Agency for Marine-Earth Science and Technology
Japan Atomic Energy Agency
JETCOOL
JMC Global Technologies I, L.P.
Johannes Gutenberg University Mainz
Johnson & Johnson
JPMorgan Chase
Juelich Supercomputing Centre (JSC)
Juelich Supercomputing Centre (JSC), Institute for Advanced Simulation
Jungo Connectivity Ltd
Juniper Networks
Kansai University
Kansas State University
Karlsruhe Institute of Technology
Katana Graph Inc
KAUST
KBR at NASA Ames Research Center
Kean University
KEK
Kent State University
Keysight Technologies
Khronos Group Inc
King Abdullah University of Science and Technology (KAUST)
KISTI
Kitware Inc
Kitware, Europe
Knox College
Kobe University, Japan
Koç University, Turkey
Korea Advanced Institute of Science and Technology (KAIST)
Korea Institute of Science and Technology Information (KISTI)
Korea Semiconductor Industry Association
Krell Institute
KTH Royal Institute of Technology, Sweden
Kubecost
Kyoto University
Kyushu University
LaBRI, France
LAMPS Laboratory
Lancium
Lancium Compute
Lappeenranta University of Technology (LUT), Finland
Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory (LBNL)
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL), Energy Efficient High Performance Computing Working Group EEHPCWG)
LDA Technologies
Leibniz Supercomputing Centre
Leiden Institute of Advanced Computer Science (LIACS)
Lenovo
Lifeboat LLC
Lightmatter
LINKS Foundation Inc
Liqid, Inc
LITEON
Los Alamos National Laboratory (LANL)
Louisiana State University
Louisiana State University, Center for Computation and Technology
Lounge
Loyola University, Chicago
Loyola University, Maryland
Lucata Corporation
Ludwig Maximilian University of Munich
Luminary Cloud Inc
lumir.dambrosio673@myci.csuci.edu
Maison de la Simulation
Maitrix
Marquette University
Massachusetts Green High Performance Computing Center (MGHPCC)
Massachusetts Institute of Technology (MIT)
Massachusetts Institute of Technology (MIT) Lincoln Laboratory
Mathematics and Computer Science Division
MathWorks
Maxwell Labs
Mayo Clinic
McGill University
McMaster University, Ontario, Canada
Medical College of Wisconsin
MEGWARE Computer
Mellanox Technologies Ltd
MemComputing, Inc.
MemVerge Inc
Met Office, UK
Meta
Meta Platforms Inc
Metify Inc.
MGHPCC
Michigan State University
Michigan Technological University
Micron Technology Inc
Microsoft Azure
Microsoft Corporation
Microsoft Research
Microsoft Research Asia
Microway Inc.
Mikros Technologies
Mississippi State University
Missouri University of Science and Technology
MIT Press
MITRE Corporation
Mobileye, an Intel Company
Moka Blox LLC
Monash University
Moog, Inc.
Moreh
Morgan State University
Motivair
N8 Centre of Excellence in Computationally Intensive Research (N8 CIR), UK
Nagoya University
Nanook Consulting
Nanyang Technological University, Singapore
Nara Institute of Science and Technology
NASA
NASA Ames Research Center
National Cancer Institute (NCI)
National Center for Atmospheric Research (NCAR)
National Center for High-Performance Computing (NCHC), Taiwan
National Center for Supercomputing Applications (NCSA)
National Centre for Atmospheric Science (NCAS), UK
National Computational Infrastructure, Australian National University
National Energy Research Scientific Computing Center (NERSC)
National Institute of Advanced Industrial Science and Technology (AIST), Japan
National Institute of Advanced Technology (ENSTA Paris)
National Institute of Allergy and Infectious Disease (NIAID)
National Institute of Informatics
National Institute of Standards and Technology (NIST)
National Nuclear Security Administration (NNSA)
National Oceanic and Atmospheric Administration (NOAA), N-Wave
National Radio Astronomy Observatory
National Renewable Energy Laboratory (NREL)
National Research Center of Parallel Computer Engineering and Technology, China
National Science Foundation (NSF)
National Supercomputer Center, Guangzhou
National Supercomputing Center (NSCC), Singapore
National Supercomputing Center in Wuxi
National Tsing Hua University, Taiwan
National University of Defense Technology (NUDT), China
National University of Singapore
Natural Intelligence Systems Inc
Naval Research Laboratory
NCAR
NCI Pawsey Supercomputing Centre Australia
NCSA
NEC Corporation
Netherlands eScience Center
Netherlands Institute for Radio Astronomy (ASTRON)
NetScout Systems
New Jersey Institute of Technology
New Mexico State University
New York University (NYU)
NexGen Analytics
Next Generation Technical Computing Unit, Fujitsu Ltd
NextSilicon Inc
NGA
NHanced Semiconductors Inc
NICT
NIST (National Institute of Standards and Technology)
Nitta Corporation of America
Nokia
Nokia Enterprise
NoMachine Computer Products Corporation
North Carolina State University
Northeastern University
Northern Illinois University
Northwestern University
Np-Complete S.r.l.
NSF- ACCESS
NSTDA Supercomputer Center (ThaiSC), Thailand
NTNU
Nuclear Research Center Negev, Israel
Numerical Algorithms Group
nVent
NVIDIA Corporation
NVIDIA Corporation, Deep Learning Institute
NVIDIA Corporation, Helsinki
NVIDIA/Microsoft Azure
NVIDIA/Oracle
Nyriad, Inc.
Oak Ridge National Laboratory (ORNL)
Oak Ridge National Laboratory, University of Manchester
Ohio State University
Ohio Supercomputer Center
Old Dominion University
Omnibond Systems LLC
One Stop Systems
Open Science Grid (OSG)
OpenHPC
OpenMP Architecture Review Board
OpenPOWER Foundation
OpenSFS Inc
Optunity Ltd.
Oracle Corporation
Oregon Health and Science University
Oregon State University
Orthogone
Pacific Northwest National Laboratory (PNNL)
Pacific Wave
Paderborn University, Germany
Panasas
ParaTools Inc
Paris School of Mines
Parker Hannifin
ParTec AG, Germany
Partnership for Advanced Computing in Europe (PRACE)
Passport to Prizes
Pawsey Supercomputing Centre
Pázmány Péter Catholic University, Hungary
Peking University
Peng Cheng Laboratory
Penguin Solutions
Penn State Institute for Computational and Data Sciences
Pennsylvania State University
Perforce Software Inc
Peripheral Component Interconnect Special Interest Group (PCI-SIG)
Phan CodeWorks
Pilot National Laboratory for Marine Science and Technology, Qingdao, China
PIONIER - Polish Optical Internet
Pittsburgh Supercomputing Center (PSC)
Polytechnic University of Catalonia, Spain
Polytechnic University of Turin
Pomona College
Prairie View A&M University
Preferred Networks Inc.
Princeton University
ProDesign Electronic GmbH
Prodrive Technologies
Proviron
Public Health Wales
Public Health Wales NHS Trust
Purdue University
Pure Storage
QCT
Qilu University of Technology, Shandong, China
QNIB Solutions
QScale
QSigma, Inc.
Qualcomm Inc
Quantinuum
Quantum Corporation
Quantum Machines, Israel
Queen's University
Queen's University, Belfast
Queen's University, Canada
Quobyte GmbH
Red Hat Inc
Renaissance Computing Institute (RENCI)
RENCI Fabric
Renmin University of China
Rensselaer Polytechnic Institute (RPI)
Rescale
Research Organization for Information Science and Technology
Retired
Rice University
Rigetti Computing
RIKEN
RIKEN Center for Computational Science (R-CCS)
Rivas.AI
Rochester Institute of Technology
Rolls-Royce
Rolls-Royce Deutschland
Rose-Hulman Institute of Technology
Run:ai Labs USA, Inc.
Run:ai, Israel
Rutgers Office of Advanced Research Computing (OARC)
Rutgers University
Rutherford Appleton Laboratory, Science and Technology Facilities Council (STFC)
RWTH Aachen University
RWTH Aachen University, IT Center
SambaNova Systems Inc
Samsung
Samsung Semiconductor, Inc.
Samtec
San Diego State University
San Diego Supercomputer Center (SDSC)
Sandia National Laboratories
Sano Centre for Computational Medicine, Krakow, Poland
Saudi Aramco
School of Electronic and Computer Engineering, Peking University
School of Information Science and Engineering, Shandong University
School of Mathematical Sciences, Peking University
Science and Technology Facilities Council (STFC)
Scientific Computing World
Scientific Toolworks, Inc (SciTools)
SCinet
SciTools
Scrivner Solutions Inc
SDSC
Seagate Systems
Seagate Technology
Seoul National University
Seqera Labs, Spain
Shanghai Jiao Tong University
Shanghai Research Center for Quantum Sciences
ShanghaiTech University
Shell
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Shenzhen University
Shodor Education Foundation
SHREC (the NSF Center for Space, High-Performance, and Resilient Computing)
SIAM - Society for Industrial and Applied Mathematics
SIDN, Netherlands
Siemens
Siemens Electronic Design Automation
SiFive
Silicon Mechanics
SiliconANGLE Media, Inc.
Singapore Management University
SiPearl
SK hynix
SLAC National Accelerator Laboratory
Slippery Rock University of Pennsylvania
Slurm
Social Networking Lounge
Software Sustainability Institute
Sogang University, South Korea
Soreq Nuclear Research Center (SNRC), Israel
Sourcery Institute
South China University of Technology, Guangzhou, China
Southeast University, Nanjing, China
Southern Methodist University
Southern University of Science and Technology, China
Space Exploration Technologies, corp (SpaceX)
Speck Industries, LP
Spectra Logic Corporation
Ss. Cyril and Methodius University in Skopje (UKIM), North Macedonia
St Ambrose University
St. John’s University
Stanford University
Starfish Storage
StarLight
Staubli
Stevens Institute of Technology
STFC Daresbury Laboratory, England
Stony Brook University
Storage Networking Industry Association (SNIA)
Storj
SUBMER TECHNOLOGIES SL
Sun Yat-sen University, Guangzhou, China
Sunshine State Education & Research Computing Alliance
Super Computing Team
SUPERMICRO
SUSE
Sustainable Horizons Institute
Swiss Federal Institute of Technology Lausanne (EPFL)
Swiss National Supercomputing Centre (CSCS)
Sygic, Slovak Republic
Syncious
Synopsys
Synopsys Inc
Tachyum Inc
Tactical Computing Laboratories
Tactical Computing Laboratories LLC
Tampere University, Finland
Tarleton State University
Tatsuta Electric Wire&Cable Co.Ltd.
Technical University Darmstadt
Technical University Darmstadt, Simulation of Reactive Thermo-Fluid Systems
Technical University Dresden
Technical University Dresden, ZIH
Technical University Munich
Technical University Munich, Computer Architecture and Parallel Systems
Technical University of Denmark
Technical University Wien (Vienna University of Technology)
Technion - Israel Institute of Technology
TechSquare Inc
Tennessee Tech University
Tenstorrent
Texas A&M University
Texas A&M University, Corpus Christi
Texas Advanced Computing Center (TACC)
Texas Christian University
Texas State University
Texas Tech University
The George Washington University
The Institute of Statistical Mathematics (ISM)
The Ohio State University
The Ohio State Unversity
The Siemon Company
The University of Tennessee
The University of Tokyo
ThinkParQ, Germany
TikTok Inc
Tohoku University
Tokyo Institute of Technology
TOP500
TOP500, Green500
TotalEnergies EP Research and Technology USA, LLC
TotalView by Perforce
Tsinghua University, China
Tuxera
Tyan Computer Corporation
UCit
UK Research and Innovation
Unaffiliated
Unifabrix
Unilever Digital R&D
Unitary Fund
United States Army Research Laboratory
United States Research Software Engineer Association
Unity In Diversity Foundation, Netherlands
University Carlos III of Madrid, Spain
University College Dublin
University College London
University College London - Centre for Advanced Research Computing (ARC)
University Institute of Computing, Chandigarh University, India
University of Alabama, Birmingham
University of Amsterdam
University of Arizona
University of Basel, Switzerland
University of Bío Bío, Chile
University of Birmingham, UK
University of Bonn
University of Bristol
University of British Columbia
University of Buffalo
University of California
University of California, Berkeley
University of California, Davis
University of California, Irvine
University of California, Merced
University of California, Riverside
University of California, San Diego (UCSD)
University of California, San Francisco
University of California, Santa Barbara
University of California, Santa Cruz
University of Cambridge
University of Central Florida
University of Chicago
University of Chicago, CERES, Center for Unstoppable Computing
University of Cincinnati
University of Colorado
University of Colorado (retired)
University of Colorado, Boulder
University of Colorado, Denver
University of Copenhagen
University of Delaware
University of Edinburgh
University of Florida
University of Geneva, Switzerland
University of Göttingen
University of Göttingen, GWDG, Germany
University of Hagen
University of Hamburg
University of Hawaii at Manoa
University of Heidelberg
University of Helsinki
University of Houston
University of Iceland
University of Illinois
University of Illinois, Chicago
University of Indonesia
University of Innsbruck
University of Iowa
University of Kansas
University of Kassel
University of Kentucky
University of Klagenfurt, Austria
University of Leeds
University of Louisiana at Lafayette
University of Maine
University of Manchester
University of Maryland
University of Maryland, Laboratory for Physical Sciences (LPS)
University of Maryland, Laboratory of Physical Sciences (LPS)
University of Massachusetts, Amherst
University of Massachusetts, Lowell
University of Michigan
University of Michigan - Dearborn
University of Minnesota
University of Missouri, Columbia
University of Muenster
University of Nevada, Reno
University of New Mexico
University of New Mexico - CARC
University of North Carolina, Charlotte
University of North Texas
University of Notre Dame
University of Oregon
University of Padova
University of Paris Saclay/Versailles
University of Paris XIII
University of Paris-Saclay
University of Perpignan, France
University of Pittsburgh
University of Reading
University of Reims Champagne-Ardenne (URCA)
University of Rochester
University of Rochester, Laboratory for Laser Energetics
University of São Paulo
University of Saskatchewan
University of Science and Technology of China
University of Science and Technology of China (USTC)
University of South Carolina
University of Southern California
University of Southern California (USC)
University of Southern California, Information Sciences Institute
University of St Andrews, Scotland
University of Strasbourg
University of Strathclyde, Scotland
University of Stuttgart
University of Stuttgart, Institute for Parallel and Distributed Systems
University of Surrey, England
University of Sydney
University of Tennessee
University of Tennessee (retired)
University of Tennessee, Chattanooga
University of Tennessee, Innovative Computing Laboratory
University of Tennessee, Innovative Computing Laboratory (ICL)
University of Texas
University of Texas Arlington - Physics Department
University of Texas, Arlington
University of Texas, Dallas
University of Texas, El Paso
University of Texas, Oden Institute
University of Texas, San Antonio
University of Tokyo
University of Toledo
University of Toronto
University of Trento, Italy
University of Tsukuba
University of Twente
University of Utah
University of Utah, Scientific Computing and Imaging Institute (SCI)
University of Valladolid, Spain
University of Victoria, British Columbia
University of Vienna
University of Virginia
University of Warwick
University of Washington
University of Western Australia
University of Wisconsin, Madison
University of Wyoming
University of York, England
US Army
US Army Combat Capabilities Development Command Aviation & Missile Center
US Army Corps of Engineers
US Army Engineer Research and Development Center (ERDC)
US Department of Defense
US Department of Energy
US Department of Energy Joint Genome Institute
US Food and Drug Administration
VAST Data
Verizon
VIAVI Solutions
Vienna University of Economics and Business Administration
VIKING ENTERPRISE SOLUTIONS
Virginia Tech
Voltron Data
VSB – Technical University of Ostrava, Czech Republic
Wake Forest University
Warsaw University of Technology
Washington State University
WeBank, China
Weka.io
WekaIO Inc
West Chester University of Pennsylvania
Westfield State University
Westphalian University of Applied Sciences
Whamcloud Inc
Wieland-Werke AG
Wisetek
World Wide Technology Inc
X-ISS
X-ScaleSolutions
Xiamen University
Xilinx Inc
Yale University
Zapata Computing Inc
ZeroPoint Technologies AB
Zhejiang Lab
Zhejiang University
Zhengzhou University of Light Industry, Software Engineering
ZIH, TU Dresden
Zuse Institute Berlin

Contributors

Bartłomiej Łagosz
Lubomír Říha
Martin Šurkovský
Paweł Żuk
Tor M. Aamodt
Ahmad Abdelfattah
Ahmad Abdelfattah
Sameh Abdulah
Waqwoya Abebe
Mikhael Abidan Abednego
Mathew Abraham
Gregory Abram
Dennis Abts
Armando Acosta
Panagiotis Adamidis
Emily K. Adams
Joel C. Adams
Ryan Adamson
Rohit Aggarwal
Tushar Agrawal
Hana Ahmed
Kishwar Ahmed
Shehab Ahmed
Dong H. Ahn
James P. Ahrens
Alex Aiken
Tassadit Ait kaci
Margaret Ajuwon
Kadir Akbudak
Mobayode Akinsolu
Mohammed Al Farhan
Gregorino Al Josan
Hussein Al-Azzawi
Mohammad Al-Tahat
Khairul Alam
Jordi Alcaraz
Vassil Alexandrov
Evguenia Alexandrova
Yuri Alexeev
Tomas Aliaga
Ahmed Aljarro
Robert Allan
Yousef Alnaser
Rabab Alomairy
Slim Alouini
Aksel Alpay
Aymen Alsaadi
Orly Alter
Nestor Alvarez
Lluc Alvarez Marti
Vinay Amatya
Mohsen Amini Salehi
Dario Amirante
Andrei Amza
Dosik An
Han An
Łukasz Anaczkowski
Anima Anandkumar
Rachana Ananthakrishnan
Aaron Anderson
Ryosuke Ando
Jonas Andrulis
Surendra Anne
Gabriel Antoniu
Ali Anwar
Hartwig Anzt
Thomas Applencourt
Pierre-Yves Aquilanti
Carlos Arango Gutierrez
Mauricio Araya
Karina Arcaute
Moiz Arif
Daniel Arndt
Dorian Arnold
Dorian C. Arnold
William Arnold
Rajat Arora
Engin Arslan
Kapil Arya
Yuuichi ASAHI
Krste Asanović
Thomas J. Ashby
Saleh Ashkboos
Kevin Assogba
Hammad Ather
Hagit Attiya
Isaac Ault
Olivier Aumage
Brian Austin
Ammar Ahmad Awan
Muaaz Gul Awan
Eduard Ayguade
Sridevi Ayloo
Ariful Azad
Noushin Azami
Ann Backhaus
David Bader
Michael Bader
Rosa M. Badia
Frank Baetke
Hakan Bagci
Abhishek Bagusetty
Anna Maria Bailey
David H Bailey
Allison H. Baker
Nolan C Baker
Jason Bakos
Venkat Bala
Pavan Balaji
D. Balamurugan
Prasanna Balaprakash
Piotr Balcer
Jan Balewski
Oana Balmau
Daniel Balouek-Thomert
Sahan Bandara
Nuno Bandeira
Imon Banerjee
Sergio Baranzini
Lorena Barba
Giuseppe M. J. Barca
Deborah Bard
Izumi Barker
Kevin J. Barker
João Barreto
Carlos Jaime Barrios Hernandez
Daniel Barry
Denis Barthou
Ignacio Bartol
Jim Basney
Rohan Basu Roy
Kees Joost Batenburg
Natalie Bates
Narangerelt Batsoyol
Gregory Bauer
Matthew E. Baughman
Tony Baylis
Olivier Beaumont
Michela Becchi
Torrin Bechtel
Gregory B. Becker
David Beckingsale
Peter Beckman
Mahdi Belcaid
Kristi Belcher
Evgenij Belikov
Jon Belk
Julian Bellavita
Emily Belli
Mehmet E Belviranli
Dorra Ben Khalifa
Shimon Ben-David
Tal Ben-Nun
Luca Benini
John Bent
Florian Berberich
Daniel Berger
Richard Berger
Luc Berger-Vergiat
Keren Bergman
Massimo Bernaschi
David E. Bernholdt
WILLIAM BERQUIST
Jakub Beránek
Julie Bessac
Maciej Besta
Timo Betcke
Wes Bethel
Ron Bewtra
Jean Luca Bez
Abhinav Bhatele
Antara Bhowmick
Sanjukta Bhowmick
Debsindhu Bhowmik
Payas Bhutra
Tekin Bicer
Daniel Bielich
Jan Bierbaum
Julien Bigot
Timea Biro
George Biros
Chritian Bishcof
Julian Bissantz
Rupak Biswas
Andrew Bitar
Stella Bitchebe
Sergey Blagodurov
Andrew Blanchard
Johannes P Blaschke
Matthew Bobbitt
Gianluca Boccardo
Albert Bode
Brett M. Bode
David Boehme
France Boillod-Cerneux
Jay Boisseau
Taisuke Boku
Leonardo Boldrini
Jorge Bolivar
Dan Bonachea
Tommaso Bonato
Philippe Bonnet
David Bonnie
Chris Bording
Kalina M Borkiewicz
Lynn Borkon
George Bosilca
Kristofer Bouchard
Wajih Boukaram
Norman J Bourassa
Aurelien Bouteiller
Anne Dara Bowen
Kurtis Bowman
Matthew Boyd
Mathew Boyer
Alexander Brace
Jim Brandt
Wesley Brashear
David Brayford
Thomas Brettin
Alexander Breuer
Thomas Breuer
Wesley Brewer
Wesley Brewer
Patrick Bridges
Ron Brightwell
Stephanie Brink
André Brinkmann
Gonzalo Brito Gadeschi
Benjamin A. Brock
James C. Brodman
Joel Brogan
Kim Sebastian Brose
Gordon Brown
Maxine Brown
Nick Brown
Oliver Thomson Brown
Phil Brown
Timothy Brown
Tisha Brown-Gaines
Deidre Brucker
Christopher D. Brumgard
Steve Bruno
Dana Brunson
Spencer Bryngelson
Hal Brynteson
Tomáš Brzobohatý
Norm Buchanan
Jeff Buchsbaum
Yakup Koray Budanaz
Hoang Bui
Aydin Buluc
Aydin Buluç
David P. Bunde
Joseph Bungo
Nick Buraglio
William Burke
Bill Burns
Rod Burns
Tobias Burnus
Adam Burrows
James Burton
Ralph Burton
Martin Burtscher
Jan Buschmann
Suren Byna
Aileen Böhme
Paul Caheny
Katharine Cahill
Haipeng Cai
Silvina Caino-Lores
Jon Calhoun
Scott Callaghan
Alexandru Calotoiu
Steven Calvez
Avery Campbell
Scott Campbell
Stuart Campbell
Bugra Can
Ramon Canal
Jeff Candy
Richard Shane Canon
Huanqi Cao
Karl-Kiên Cao
Qinglei Cao
Shiyi Cao
Zhendong Cao
Salvador Capella-Gutiérrez
Franck Cappello
Arlie Capps
Emilio Carcamo
Pedro Augusto Cardoso Cotrim Moreira
Luca Carloni
Richard Carlson
Philip Carns
Eddy Caron
Ilene Carpenter
Jeffrey D Carpenter
Michael Carr
Rocío Carratalá-Sáez
Jesus Carretero
Kate Carter
Hunter Carver
Henri Casanova
Marc Casas Guix
Ralph Castain
Vito Giovanni Castellana
Miguel Castro
Simon Caton
Sebastien Cayrols
Silvina Caíno-Lores
Christophe Cerin
Eduardo Cesar
Zachary Cetinic
Luis Ceze
Kenny Cha
Mohamad Chaarawi
Dhruva Chakravorty
Saiprathik Chalamkuri
Alan Chalker
Prajwal Challa
Noel Chalmers
Bradford Chamberlain
Henry Chan
Jany Chan
Aparna Chandramowlishwaran
Sunita Chandrasekaran
Mayanka Chandrashekar
Fu-Chiang Chang
Thanawat Chanikaphon
Barbara Chapman
Kyle Chard
Ryan Chard
Kenna Chase
ayush chaturvedi
Vipin Chaudhary
Geeta Chauhan
Tanny Chavez
Chao Chen
Dan Chen
Dexun Chen
Hang Chen
Hongwei Chen
Jacqueline Chen
Jiangzhuo Chen
Jianhai Chen
Jiaxi Chen
Jie Chen
Jieyang Chen
Jiunn-yeu Chen
Juan Chen
Junfeng Chen
Junshi Chen
Kaiqi Chen
Kang Chen
Peng Chen
Po-Hao Chen
Qi Chen
Quan Chen
Songqing Chen
Wenguang Chen
Winson Chen
Yi Chen
Yong Chen
Yujie Chen
Yuxin Chen
Zizhong Chen
Zuoning Chen
James Cheng
Jiajun Cheng
Wen Cheng
Yue Cheng
Ganesh Chennimalai Sankaran
James Cherry
Mathew J. Cherukara
Kazem Cheshmi
SIMBARASHE Chidyagwai
Andrew Chien
Hank Childs
Saikeerthi Chirumamilla Hagadur
Sathya Chitturi
Younghyun Cho
Jaemin Choi
Min Choi
Fred Chong
Jerry Chou
Edmond Chow
Marcin Chrapek
AJ Christensen
Steffen Christgau
Hubert Chrzaniuk
Ruilin Chu
Neil Chue Hong
Giri Chukkapalli
I-Hsin Chung
Joaquin Chung
Pascal Chung
Woosuk Chung
Yeh-Ching Chung
Valentin Churavy
Florina M. Ciorba
Isaac Cisneros
Thomas Clark
Noah Clemons
Lennard Clicque
Austin Clyde
Radim Cmar
Carleton Coffrin
Susan Coghlan
Jeremy Cohen
Yonatan Cohen
Joel Coldren
Tainã G D Coleman
Mark Coletti
Mathew Colgrove
Javier Conejero
Guojing Cong
Thomas Richard Connor
Brandon Cook
Jonathan Cook
Gene Cooperman
Marcin Copik
Matteo Corbetta
Stefano Corda
Alexandru Costan
Ian Costello
Jim Cownie
Donna J. Cox
Melissa Cragin
György Cserey
Attila Csikász-Nagy
Loïc Cudennec
Candace Culhane
Massimiliano Culpo
Sean Cunningham
Mikaela Currier
Tony Curtis
Maciej Cytowski
Lumir D'Ambrosio
John D. Davis
Weslley da Silva Pereira
Adel Dabah
Gabi Dadush
Maytal Dahan
Tamara L. Dahlgren
Johann Dahm
Dong Dai
Donglai Dai
Yiqin Dai
Gregor Daiß
Kenneth Dalenberg
Chris Daley
Christopher Daley
Christian Dallago
Bill Dally
John Daly
patricia damkroger
Anthony Danalis
Tharun Kumar Dangeti
Stefanie Dao
Nour Daoud
Dibyendu Das
Sajal Das
Debendra Das Sharma
Prasanna Date
Sandeep Dattaprasad
Harish Dattatraya Dixit
Mithil Dave
Eddie Davis
James J. Davis
Joshua H Davis
Johannes de Fine Licht
Bert de Jong
Cees De laat
Daniele De Sensi
Bronis R. de Supinski
Tom Deakin
Nathan DeBardeleben
Florian Deconinck
Ewa Deelman
Elise Degen
Gerald Degrace
Eliakin del Rosario
Christopher Delay
Robert L. DeLeon
Matt Demas
Dario Dematties
Ali Can Demiralp
James Demmel
Yuntian Deng
Philippe DENIEL
Larry Dennison
Joel Denny
Chris DePrater
Luiz DeRose
Ryan DeRue
Sameer Deshmukh
Lizanne DeStefano
Ian Dettwiller
Erik Deumens
Hariharan Devarajan
Aditya Devarakonda
Shaheen Dewji
S. Charlie Dey
Anees Dhabaan
Akash V. Dhruv
Sheng Di
Salvatore Di Girolamo
Giuseppe Di Guglielmo
Olivia Di Matteo
Luis Diaz-Santini
Patrick Diehl
Kranzlmüller Dieter
James D. Diffenderfer
Andreas Dilger
Haipeng Ding
Nan Ding
Yufei Ding
Tu Mai Anh Do
Peter Doak
Bahador Dodge
Johannes Doefert
Johannes Doerfert
Prerna Dogra
Jered Dominguez-Trujillio
Jens Domke
Tingzhen Dong
Xishuang Dong
Yong Dong
Jack Dongarra
Matthieu Dorier
Ethan Dorta
Matthew G. F. Dosanjh
Maureen Dougherty
Derek Doyle
Erik Draeger
Justin Drake
Ulrich Drepper
Maurizio Drocco
Oleg Drokin
Nikhil Dronamraju
Nikoli Dryden
Xiaoyong Du
Yun Du
Zhen Du
Anshu Dubey
Philippe Duchon
Daniel Duffy
Diana Dugas
Ming Dun
Ross Duncan
Ann Dunkin
Ngoc Yen Duong
Logan Durham
Akash Dutta
Soumya Dutta
Landon Richard Dyken
David Dykstra
David Eberius
David Eder
Chris Egersdoerfer
Nasir Eisty
Jorge Ejarque
Saliya Ekanayake
Ouadie El Farouki
Kaoutar El Maghraoui
Oliver D. Elbert
Tonia Elengikal
Vadim Elisseev
Sally Ellingson
J. Austin Ellis
Marquita Ellis
Robert Ellison
Nahid Emad
MURALI EMANI
Murali Emani
David R. Emerson
Christian Engelmann
Jussi Enkovaara
Connor Ennis
Mattan Erez
Daniel J. Ernst
Dominik Ernst
Alexis Espinosa
Aniello Esposito
Trilce Estrada
Lionel Eyraud-Dubois
Farah Fahim
Suhaib A Fahmy
Yudho Ahmad Fahreza
Phillipp Falk
Alex Fallin
Jing Fan
Ke Fan
Yi Fan
Bo Fang
Leticia Suellen Farias Machado
Steven Farrell
Luca Fedeli
Scott Feister
David Feller
Boyuan Feng
Wu Feng
Zonghao Feng
John Feo
Charles Ferenbaugh
Michael Ferguson
Nabil Abdelaziz Ferhat Taleb
Mark Fernandez
Milinda Fernando
Enol Fernández
José M. Fernández
Marcel Ferrari
Federica Ferraro
Kurt B. Ferreira
Rafael Ferreira da Silva
Nicola Ferrier
Rosa Filgueira
Weronika Filinger
Henrique Fingler
Hal Finkel
Jesun Firoz
Paul Fischer
Rudy Flores
Patricia Florssi
Thomas Flynn
Fernanda Foertter
Sam Foreman
Anna Fortenberry
Eric Foslet-Lussier
Greg Foss
Ian Foster
Georgios Fourtakas
Jeremy Fowers
Geoffrey Fox
Judy Fox
Chris Francis
Stu Franks
Jaime Freire de Souza
Ulrich Frey
George Matthew Fricke
Yehonatan Fridman
Brian C. Friesen
Kaihua Fu
Yuqi Fu
Joel Fuentes
Oliver Fuhrer
Kohei Fujita
Harrison Fullwood
Daniel Fulton
Mako Furukawa
Chris Fuson
Karl Fürlinger
Henry Gabb
Alan Gadian
Ana Gainaru
Kelly Gaither
Jorge Galvez-Vallejo
Todd Gamblin
Xinbiao Gan
Baskar Ganapathysubramanian
Auroop Ganguly
Jim Ganthier
Shang Gao
Jaime Garcia
Jose Garcia
Simon Garcia De Gonzalo
Maria Garzaran
Derek R. Gaston
Mark Gates
Aditi Gaur
Ada Gavrilovska
Rahulkumar Gayatri
Polo GE
Shijian Ge
Johannes Gebert
Al Geist
Alexander Geiß
Tong Geng
Ann Gentile
Marc Genton
Giorgis Georgakoudis
Evangelos Georganas
Anjus George
Naje George
Rhea George
Antigoni Georgiadou
Svetlozar Georgiev
Pawel Gepner
Timothy C. Germann
Michael Gerndt
Balazs Gerofi
Tim Gerrits
Axel Gerstenberger
Sandra Gesing
Alireza Ghaffarkhah
Sheikh Ghafoor
Aroua Gharbi
Omar Ghattas
Siavash Ghiasvand
Pieter Ghysels
Tom Gibbs
Thomas Gilray
Ethan Gindlesperger
Maria Girone
Jens Glaser
Niels Gleinig
Milos Gligoric
Christian Glusa
Christopher Goddard
William Godoy
Victoria Godsoe
Deepak Goel
Ali Murat Gok
Maya Gokhale
Debra Goldfarb
Mehdi Goli
Orlando Gomez
Rosalia Gomez
Edson Satoshi Gomi
Elsa J. Gonsiorowski
David Gonzalez
JOSHUA GONZALEZ
Arturo Gonzalez-Escribano
Marc Gonzalez-Tallada
John Goodhue
Ganesh Gopalakrishnan
Mark Gordon
Nicholas Gordon
Steven Gordon
Lev Gorenstein
Stefano Gorini
Sergei Gorlatch
Mikaila Gossman
Kevin Gott
Erik Gough
John Gounley
Anish Govind
Zachary E Graber
Richard Graham
Michael Granado
Jose Granados
Virginie Grandgirard
Ryan Grant
Brian Gravelle
Mark Gray
Adam Green
Oded Green
Hugh Greenberg
Cameron Greenwalt
Rickey Gregg
Nicolas Greneche
Gary Grider
Andrew Grimshaw
Andrew Grimshaw
Linus Groner
William D. Gropp
Pascal Grosset
Paola Grosso
Taylor Groves
Nathan E. Grube
Patricia Grubel
Thomas Gruber
Yongbin Gu
Qiang Guan
Zhongheng Guan
Amal Gueroudji
Tobias Guggemos
Chuangyi Gui
Yuntao Gui
Jonathan Guiang
Giulia Guidi
Nicolas Guidotti
Alicia Guite
Taresh Guleria
Akhil Guliani
Chathika Gunaratne
Dilan Gunawardana
Chu Guo
Minyi Guo
Shengjian Guo
Xiaohu Guo
Yanfei Guo
Yang Guo
Yixin Guo
Zhuoqiang Guo
Marjan Gusev
Pano Gushev
Ethan Gutmann
Raz Gvishi
Bálint Pál Gyires-Tóth
Fritz Göbel
Mert Gürbüzbalaban
Christian Haas
Tom Haber
Bilel Hadri
Georg Hager
Christopher Haine
Mahantesh Halappanavar
David Hall
Mary Hall
Niklas Halonen
Dorit M. Hammerling
Jeff R. Hammond
Simon Hammond
Simon Hammond
Wei Han
Toshihiro Hanawa
David Hancock
Jan-Willem Handgraaf
Michael Haney
Sean Hanlon
Heidi Hanson
Jeff Hanson
Guanhua Hao
Tianyi Hao
Hussein Harake
Re'em Harel
Re'em Harel
Paul H. Hargrove
Siva Kumar Sastry Hari
Kevin Harms
Peter Harrington
J. Austin Harris
Veronica Harris
Cyrus Harrison
William Harrod
Michael Hartman
Rebecca Hartman-Baker
Christine Harvey
Yuta Hasegawa
Shintaro Hashimoto
Reza Hassanian
Christian Hasse
Linda Hayden
Brian Haymore
Valérie Hayot-Sasson
Lixin He
YiLiang HE
Yun (Helen) He
Yuxiong He
Zhenhua He
Marco Heddes
Lukas Heidemann
Alexander Heinecke
Esa Heiskanen
Ásdís Helgadottír
Elena Henderson
Elena Henderson
Jessie Henderson
Matthew L Henderson
Samuel Henderson
Danny Hendler
Michael Hennecke
Ian Henriksen
Troels Henriksen
Gregory M. Henry
Robert Henschel
Thomas Herault
Martin C. Herbordt
Bridger Herman
Benjamin Hernandez
Frank Herold
Rafael Andres Herrera Guaitero
Andreas Herten
Demian Hespe
Braden Hester
Simon Hettrick
V. Hewes
Alexander Hexemer
Tony Hey
Wolf Hey
Jason Hick
Megan L. Hickman Fulp
dean Hildebrand
Judith C. Hill
Conrad Hillairet
Alex Himmel
Jacob Hinkle
Catherine Hinton
Kyle Hippe
Eric Hirschmann
Torsten Hoefler
Ian Hoffman
Armin Hohenegger
Adolfy Hoisie
Petrina Hollingsworth
Elizabeth Holman
Connor Holmes
Kristina Holsapple
Jonah Holtmann
James Hong
Seung Chan (Daniel) Hong
Taeyoung Hong
Stefan Hoops
Muneo Hori
Takane Hori
Daniel Horta
Gergely Horvath
Tetsuya Hoshino
Muhammad Mainul Hossain
Adam Hough
Paul Hovland
Lee Howes
Markéta Hrabánková
Markus Hrywniak
Chung-Hsing Hsu
Darren Hsu
Aibo Hu
Wei Hu
Yichang Hu
Yihua Hu
Dan Huang
En-Ming Huang
Guyue Huang
Han Huang
Hua Huang
Pengcheng Huang
Songfang Huang
Tai Huang
Xiansong Huang
Yafan Huang
Yeqi Huang
Yu Huang
Joseph Huber
Thomas Huber
Kevin Huck
David Hudak
Stephen Hudson
Axel Huebl
Jan Hueckelheim
Clayton Hughes
Maxime Hugues
Maxime HUGUES
Immo Huismann
Geoffrey C Hulette
Travis Humble
Sascha Hunold
Dawn Hunter
James Hurd
Joshua Hursey
Jack Hurst
Scott Hutchison
Wen-mei Hwu
Hermann Härtig
Lukas Hübner
Costin Iancu
Alex Iankoulski
Khaled Ibrahim
Mohannad Ibrahim
Shadi Ibrahim
Tsuyoshi Ichimura
Akihiro Ida
Yasuhiro Idomura
Patrick Iff
Hillary Igharo
Shuichi Ihara
Alexandros-Stavros Iliopoulos
Thomas Ilsche
Neena Imam
Toshiyuki Imamura
Takuya Ina
Hikaru Inoue
Joseph A. Insley
Stephan Irle
Michael Irvin
Kate Isaacs
Mikhail Isaev
Mihailo Isakov
Sergio Iserte
Kelly Isham
Masado Ishii
Kamil Iskra
Tanzima Islam
Sharat Israni
Ken Iwata
Christiane Jablonowski
Adrian Jackson
John Jacobson
Mathias Jacquelin
Julien Jaeger
Arpan Jain
Rajeev Jain
Rutwik Jain
Shalini Jain
Thomas Jakobsche
Safdar Jamil
Tim Jammer
Siddhartha Jana
Kacper Janda
Ali Jannesari
Chris Janson
Michael Jantz
Aaron M Jarmusch
Milan Jaroš
Stephen Jarvis
Vishwesh Jatala
Stephan Jaure
Jean-Baptiste Jeannin
Joachim Jenke
Tre' Jeter
Shantenu Jha
Yuede Ji
Dongning Jia
Menghan Jia
Weile Jia
Zhihao Jia
Guanxian JIANG
Hailong Jiang
Peng Jiang
Qingcai Jiang
Henry Andres Jimenez
Diego Jiménez
Feiyang Jin
Hai Jin
Hongwei Jin
Sian Jin
Tatiana Jin
Zheming Jin
Yuchen Jing
Hans Johansen
Jophin John
Bryan Johnston
David A. Joiner
Andrew Jones
Terry Jones
Kirk E. Jordan
Jithin Jose
Nrushad Joshi
Wayne Joubert
Dylan Jude
William Judge
János Juhasz
Myoungsoo Jung
Pascal Jungblut
Mozhgan Kabiri chimeh
Hussain Kadhem
Christopher Kadow
Jason Kaelber
Albert Kahira
Srinath Kailasa
Paul Kairys
Hartmut Kaiser
Bharat Kale
Laxmikant Kale
Vivek Kale
Sergei V. Kalinin
Justin Kalloor
Brandin Kammerdiener
Raghavendra Kanakagiri
Kaushik Kandadi suresh
Yao Kang
Rajgopal Kannan
Ramakrishnan Kannan
Sven Karlsson
Haniye Kashgarani
Aditya Kashi
Ilias Katsardis
Daniel S. Katz
Gagandeep Kaur
Aditya Kaushik
Sahit Kavukuntla
Masatoshi Kawai
Takahiro Kawashima
Fazeleh Sadat Kazemian
Daniel Keefe
Gordon Keeler
Ariel Kellison
Tamar Kellner
Christopher Kelly
Isabelle Kemajou-Brown
Paul Kent
Ronan Keryell
Ronan Keryell
Gokcen Kestor
Raj Kettimuthu
Rajkumar Kettimuthu
Kurt Keville
David E. Keyes
Bence Keömley-Horváth
Daman Khaira
Dounia Khaldi
Ali Khan
Awais Khan
Mohammad Khan
Arindam Khanda
Makrand Khanwale
Moutazbellah Khater
Alireza Kheirkhahan
Amine Khodja
Nesrine Khouzami
Vahdaneh Kiani
Yuma Kikuchi
Byoung-Do Kim
Jeeeun Kim
Juno Kim
Kihyun Kim
Minjun Kim
Seungchan Kim
Youngjae Kim
Jason Kincl
Michel Kinsy
Mariam Kiran
Jack Kirk
Christine Kirkpatrick
Nurit Kirshenbaum
Severin Kistler
Joy Kitson
Fredrik Kjolstad
Mark Klaisoongnoen
Scott Klasky
Kerstin Kleese van Dam
Michael Klemm
Tom Klosterman
Marius Knaust
Maximilian Knespel
Christian Kniep
Mitchell Knight
Robert Knop
Carlton Knox
Shelley Knuth
Hiroaki Kobayashi
Andreas Koch
Marcel Koch
Peter M. Kogge
Scott Kohlert
Bastian Koller
Kazuhiko Komatsu
Alice Koniges
Ralph Koning
Orestis Korakitis
Anton Korzh
Anthony Kougkas
Patricia Kovatch
James Kowalkowski
Adam Kowalski
Kentaro Koyama
Quincey Koziol
Marina Kraeva
Michael Krajecki
William Kramer
Dieter A. Kranzlmueller
Dieter Kranzlmüller
David Krasowska
Jiri Kraus
Patrycja Krawczuk
Nathaniel Kremer-Herman
Rajiv Krishnakumar
Harinarayan Krishnan
Marcel Krüger
John Kubiatowicz
Jeff Kuehn
Shruti Kulkarni
Mandeep Kumar
Mohit Kumar
Nalini Kumar
Sidharth Kumar
Ashwin Kundeti
Adam J. Kunen
Julian Kunkel
PIN-YI KUO
Ruth Kurniawati
Thorsten Kurth
Jakub Kurzak
Ryota Kusakabe
Karsten Kutzer
Mateusz Kuzak
JaeHyuk Kwack
Grzegorz Kwasniewski
Grzegorz Kwasniewski
Nikos Kyrpides
Jesus Labarta
Nicolas Lachiche
Pierre Axel Lagadec
Ignacio Laguna
Junjie Lai
Kartik Lakhotia
Sumathi Lakshmiranganatha
Maddegedera Lalith
Michael O. Lam
Jacob Lambert
Yu-Hsiang Lan
Zhiling Lan
John Lange
Jan Langer
Julien Langou
Eric Lançon
Leigh Lapworth
Michelle Lara
Matthew Larsen
Catherine Larson
Jeffrey Larson
Robert Latham
Scott Lathrop
Jonas Latt
Guillaume Latu
AJ Laurer
Alan Lawrence
Richard Lawrence
Margaret Lawson
Henry Le Berre
Valentin Le Fèvre
Adrien Leblanc
Damien Lebrun-Grandie
david lecomber
Changgyu Lee
Hochan Lee
Hyungro Lee
Jae-Kook Lee
Jaejin Lee
Jason Lee
Seungjin Lee
Seyong Lee
Seyong Lee
Wonchan Lee
Daniel Lee Nichols
Laurent Lefevre
Remi Lehe
John Leidel
Jason Leigh
Charles E. Leiserson
Pierre Lemarinier
Kurt Lender
Geoffrey Lentner
Edgar A. Leon
Leslie Leonard
Mary Ann Leung
Harel Levin
Scott Levy
Stuart A. Levy
Daniele Lezzi
Ang Li
Boyang Li
Cheng Li
Cong Li
Dahong Li
Dong Li
Dongsheng Li
Du Li
Fang Li
Fang Li
Guanpeng Li
Hanbing Li
Haoxuan Li
Huizhong Li
Jiajia Li
Jie Li
Jielan Li
Junjie Li
Karen Li
Lei Li
Lingda Li
Liu Li
Mingfan Li
Mingyi Li
Mingzhen Li
Pengcheng Li
Shenggui Li
Sherry Li
Shigang Li
Tracey Li
Xiaoming Li
Xinyi Li
Xueqi Li
Yichao Li
Yicheng Li
Yong Li
Yue Li
Zecheng Li
Zhenyu Li
Dmitry Liakh
Xiao Liang
Xin Liang
Xin Liang
Chunhua Liao
Xiaofei Liao
Helena Liebelt
Ron Lieberman
Radita Liem
Hyun Lim
Seung-Hwan Lim
Vincent Lim
Diangen Lin
Meifeng Lin
Pei-Hung Lin
Rongfen Lin
Wei-Chen Lin
Neil Lindquist
LeAnn Lindsey
Peter Lindstom
Peter Lindstrom
John Linford
Melanie Little
Aaron Liu
Fangfang Liu
Frank Liu
Geng Liu
Hang Liu
Honggao Liu
Jane Liu
Jie Liu
Jinyang Liu
Junhong Liu
Minzhao Liu
Sha Liu
Siran Liu
Xian Liu
Xiaoyan Liu
Xin Liu
Xu Liu
Yang Liu
Yi Liu
Zhengchun Liu
ZiFan Liu
Josep Llosa
Glenn K. Lockwood
Glenn K. Lockwood
Jennifer Loe
Jay Lofstead
Bruce Loftis
Luke Logan
Jason Lohrey
Julien Loiseau
Yuan Ren Loke
Johann Lombardi
Christopher Lompa
Bill Long
Stephen M Longshaw
Francesc Lordan
Gerald Lotto
Mike Lowe
Hatem Ltaief
Hao Lu
Kai Lu
Lu Lu
Qinglin Lu
Wenbin Lu
Xiaomin Lu
Zhongzhi Luan
Robert F. Lucas
Piotr Luczynski
Morgan Ludwig
Thomas Ludwig
Jakob Luettgau
Zarija Lukić
Zarija Lukic
Hengrui Luo
Qiong Luo
Tianhuan Luo
Ye Luo
Yingwei Luo
Zhaolong Luo
Guido Lupieri
Piotr Luszczek
Dmitry Lyakh
Danylo Lykov
Isaac Lyngaas
Marc Lyonnais
Konstantin Läufer
Heng Ma
Huan Ma
Hui Ma
Qianxiang Ma
Shaonan Ma
Teng Ma
Xiaolong Ma
Zixuan Ma
Sulthan Zahran Ma’ruf
Raghu Machiraju
Adrian Macias
Lalith Maddegedara
Ravi Madduri
Sandeep Madireddy
Alberto Madonna
Robert Magno
Bill Magro
Ketan Maheshwari
Meghanto Majumder
Preeti Malakar
Maciej Malawski
Nicholas Malaya
Milos Malesevic
Abid Malik
Avik Malladi
Allen Malony
Carlos Maltzahn
Tarun Malviya
Joe Mambretti
Anirban Mandal
Carla Mann
Tyler Mannings
Dominic A. Manno
Robert Manson-Sawko
Nicolau Manubens Gil
Joseph Manzano
Aniruddha Marathe
Madhav Marathe
Dominic Marcello
David Marchant
Daniele Marchisio
Davit Margarian
Andrea Mari
Ruth Marinshaw
Pieter Maris
Georgios Markomanolis
David Markowitz
Andres Marquez
Nicole Marsaglia
Matthieu Martel
David Martin
Philipp Martin
Pino Martin
Maxime Martinasso
David J. Martinez
Mike Martinez
Moises Martinez Herrera
Jan Martinovič
Per-Gunnar Martinsson
Xavier Martorell
Md Hasan Al Maruf
Massimo Mascaro
Victor Mateevitsi
Michael Matheson
Hiroya Matsuba
Satoshi Matsuoka
Ryoya Matsushima
Anne Matsuura
Peter Mattson
Tim Mattson
Jeffrey Mauldin
Christopher Mauney
Avinash Maurya
Jenna May
Wil Mayers
Wil Mayers
Chris Maynard
François Mazen
Patrick S. McCormick
Ben McDonough
Marty McFadden
Jeremy McGibbon
Brendan McGinty
Lois Curfman McInnes
Gavin McIntosh
Simon McIntosh-Smith
Dylan McReynolds
Ondrej Meca
Maryam Mehri Dehnavi
Susan Mehringer
Apurva Mehta
Neil Mehta
Ulrike Meier Yang
Anne-Ruth Meijer
Verónica G. Melesse Vergara
Ben Menadue
Celso L. Mendes
Chanelle Menefee
Esteban Meneses
Jathinson Meneses
Jie Meng
Ke Meng
Harshitha Menon
Stefano Mensa
Arif Merchant
Anu MERCIAN
Cristin Merritt
Elia Merzari
Andre Merzky
Peter Messmer
Michael Metcalfe
Burak Mete
Johannes Meuer
Martin Meuer
Lucas Meyer
Marek Michalewicz
George Michelogiannakis
Kristel Michielsen
Simon Michnowicz
Ross Mickens
Cesare Miglioli
Matthew Mikhailov
Petro Junior Milan
Dimitar Mileski
Reed M Milewicz
Muhammad Rizky Millennianno
Cena Miller
Dejan Milojicic
Daniel J Milroy
Josh Milthorpe
Misun Min
Narasinga Rao Miniskar
Ronald Minnich
Marco Minutoli
Claudia Misale
Dmitry Mishin
Stephen Mobbs
Ali Mohammed
Jamal Mohd-Yusof
Bernd Mohr
Rick Mohr
Kathryn Mohror
Shintaro Momose
Inder Monga
Mohammad Alaul Haque Monil
Julien Monniot
Laura Monroe
Jose Manuel Monsalve Diaz
Jose Monteiro
Adam Moody
Aekyeung Moon
Joseph Moore
Shirley Moore
Maxim Moraru
José Moreira
Miquel Moreto
Colin Morey
Laura Morgenstern
Karla Morris
Jack Morrison
Henning Mortveit
Thomas Moschny
William S. Moses
Tarek Mostafa
Chan-Yu Mou
Irene Moulitsas
Baorun Mu
Misbah Mubarak
Gihan Mudalige
Dheevatsa Mudigere
Frank Mueller
Frank Mueller
Debangshu Mukherjee
Julia Mullen
Julie Mullen
Hausi Muller
Miranda Mundt
Philip Munksgaard
Christian Munley
Todd Munson
Killian Muollo
Sota Murakami
Kazunori Muramatsu
Ina Murphy
Richard Murphy
Bharath Muthiah
Onur Mutlu
Andrew Myers
Matthias S. Müller
Pratik Nag
Harsha Nagarajan
Hidemoto Nakada
Kengo Nakajima
Hai Ah Nam
Zifan Nan
Kumudha Narasimhan
Sri Hari Krishna Narayanan
Akira Naruse
Thomas Naughton
John-Luke Navarro
John-Paul Navarro
Philippe Olivier Navaux
Pratik Nayak
Nagmat Nazarov
Benjamin Nederveld
Reece Neff
David Neilsen
Stephen Neuendorffer
Sarah M. Neuwirth
CJ Newburn
Kevin P. Newmeyer
Esmond G. Ng
Linh Ngo
Andrew Nguyen
Phuong Nguyen
Tan Nguyen
Tri Nguyen
Coleman Nichols
Larkin H Nickle
Bogdan Nicolae
Weili Nie
Zhiwei Nie
Dmitry Nikolaenko
Sandra Nite
Anton Njavro
Antonio Noack
Jorji Nonaka
Anant V. Nori
Andrew Norman
Boyana Norris
Chris North
Douglas Norton
Takafumi Nose
April Novak
Peter Nugent
Werner Nutt
William D. Nystrom
Ken O’Brien
Kenneth O'Brien
Cosmin Eugen Oancea
Kevin Obrejan
Lena Oden
Daniel Olds
Vladyslav Oles
Lenny Oliker
Naoyuki Onodera
Guang Ooi
Hiroyuki Ootomo
Shameema Oottikkal
Sarp Oral
Sarp Oral
Gal Oren
Fabian Orland
Cindy Orozco Bohorquez
Alessandro Orso
Peter Orth
Dossay Oryspayev
Kazuki Osawa
Daniel Osei-Kuffuor
Tomasz Osinski
Marcin Ostasz
Thomas Otahal
Jennifer Ott
Michael Ott
Sven Ott
Olga Ovchinnikova
Karl Oversteyns
John Owens
Mark Oxley
Mehmet Ozakin
So Ozawa
A. Baris Ozguler
Thomas Padioleau
Ludger Paehler
Brian A Page
Scott Pakin
Erik Palmer
Andrew Palughi
Rachel Palumbo
Brian Pan
Yun Pan
Dhabaleswar K. (DK) Panda
Aashish Pandey
Santosh Pandey
Ronald Pandolfi
Yunfei Pang
Ajay R. Panyala
Jean-Pierre Panziera
George Papadimitriou
Tom Papatheodore
Guillaume Papauré
Michael E. Papka
Manish Parashar
Manish Parashar
Konstantinos Parasyris
Suzanne T. Parete-Koon
Charlotte Park
Cheolmin Park
EunJung (EJ) Park
Inhyuk Park
Yeohyeon Park
Yoonho Park
Scott Parker
Tharindu Patabandi
Saumil Patel
Tirthak Patel
Marc Paterno
Tapasya Patki
Robert M. Patterson
Robert Patti
Robert Patton
J. Gregory Pauloski
Georgios Pavlopoulos
Maciej Pawlik
Karina Pešatová
Olga Pearce
Carl Pearson
Gabriel Pedraza
Kevin Pedretti
Yu Pei
Yu Pei
Ivy Peng
Liang Peng
John Pennycook
Gabriel Perdue
Lisa Perez
Adrian Perez Dieguez
Danilo Perez-Rivera
Alexis Perry-Holby
Tom Peterka
Arthur Peters
Jimme Peters
Dirk Petersen
N. Anders Petersson
Fabrizio Petrini
Antonio J. Peña
Carlos Peña-Monferrer
Dirk Pflüger
Wileam Phan
Walther Philip
Cynthia Phillips
Malachi Phillips
Christelle Piechurski
Marlon Pierce
Alexander Pinard
Oriol Pineda
Keshav Pingali
Jose Pablo Pinilla Gomez
Yuval Pinter
Christian Pinto
Marco Pistoia
Maksym Planeta
Dirk Pleiter
Darius Plesan-Tohoc
Christian Plessl
Rajan Plumley
Étienne Plésiat
Norbert Podhorszki
Michal Podstawski
Samuel D. Pollard
Julian Pollinger
Sri Priya Ponnapalli
Steve Poole
Swaroop Pophale
Serban D. Porumbescu
Jonas Posner
Thomas E. Potok
Loic Pottier
Line Pouchard
Elena Pourmal
Jeaime Powell
Tyler Powell
Arun Prabhakar
Aurelio Jethro Prahara
Nirmal Prajapati
Kelsey Prantis
Sushil K. Prasad
Viktor K. Prasanna
Prajna Prasetya
Cédric Prigent
Justin Privitera
Radu Prodan
Joachim Protze
Bartłomiej Przybylski
Dave Pugmire
Venkat Pullela
Benjamin Pullman
Satish Puri
Wirawan Purwanto
Marc Pérache
Mathias Pütz
Apan Qasem
Jiaxing Qi
Depei Qian
Wei Qian
Xian Qian
Xingjian Qian
Yingjin Qian
Xinming Qin
SHENGHAO QIU
Chengyi Qu
Long Qu
Patrick Quinn
Tiago Quintino
Tiago Quinto
Santosh Radha
Anand Radhakrishnan
Ken Raffenetti
Bruno Raffin
M. Mustafa Rafique
Krishnan Raghavan
Vance Raiti
Sivasankaran Rajamanickam
Samyam Rajbhandari
Vinay Ramakrishnaiah
Vinay Ramakrishnaiah
Karthik Raman
Arvind Ramanathan
Bharath Ramesh
Srinivasan Ramesh
Rajiv Ramnath
Jini Ramprakash
Amanda Randles
Aman Rani
Nageswara S. Rao
Nikhil Harish Rao
Ari Rasch
Kevin Rasch
Md. Hasanur Rashid
Siddhisanket Raskar
Jeff Rasley
Katherine Rasmussen
Thilina Rathnayake
Bhupendra A. Raut
John Ravi
Vaidhyanathan Ravichandran
Marshall Rawson
Michael Rawson
Elaine M. Raybourn
Stephane Raynaud
Alan Real
Jaydon Reap
Adrian Reber
Daniel A. Reed
Istvan Z. Reguly
James Reinders
Rob Reiner
Zhixiang Ren
Luc Renambot
Luc Renambot
Arnaud Renard
Alistair Rendell
William Reus
Albert Reuther
Zouheir Rezki
Bradley Riapolov
Alejandro Ribes
Chris Richardson
Morris Riedel
Jason Riedy
Eleanor Rieffel
Michael Ringenburg
Sashko Ristov
Marcus Ritter
Pablo Rivas
Francesco Rizzi
Silvio Rizzi
Yves Robert
Eric Roberts
Dana E Robinson
Ricardo Rocha
Cassandra Rocha Barbosa
Laura Rodríguez-Navas
Niklas Roemer
Martin Roetteler
Benedict Rogers
David H. Rogers
David M. Rogers
Manuel Lopez Roland
Alana Romanella
Melissa Romanus
Joshua Romero
Kenton Romero
Isabel Rosa
Caitlin Ross
Robert B. Ross
Christopher J. Rossbach
Philip C. Roth
Barry Rountree
Damian Rouson
Duncan Roweth
Kelly L Rowland
Banani Roy
Chanchal Roy
Indranil Roy
Probir Roy
Shaolun Ruan
Cindy Rubio-González
Martin Ruefenacht
Francesco Ruffino
Maria del Carmen Ruiz Varela
Hakizumwami Birali RUNESHA
Matan Rusanovsky
Paul Ruth
Olatunji Ruwase
William L. Ruys
Mats Rynge
Hoon Ryu
Brian S. Ryujin
Krzysztof Rzadca
Ola Rønning
Ponnuswamy Sadayappan
Mustafa Emre Sahin
Harshita Sahni
Emmanuelle Saillard
Naohisa Sakamoto
Niko Sakic
Roope Salmi
Manisha Salve
Amit Samanta
Prathmesh Sambrekar
Francesca Samsel
Ahmed Sanaullah
Karissa Sanbonmatsu
Maria Ribera Sancho
Cooper Sanders
Peter Sanders
Michael Sandoval
Rajesh Sankaran
Subramanian Sankaranarayanan
Subramanian Sankaranarayanan
Kentaro Sano
Piyush Sao
Leopekka Saraste
Priya Sarathy
Abhik Sarkar
Vivek Sarkar
Darshan Sarojini
Shima Sasanpour
Varuni Sastry
Hayden Sather
Kento Sato
Masayuki Sato
Mitsuhisa Sato
Kumar Saurabh
Aaron Saxton
Philipp Schaad
Jeremy Schafer
Michel Schanen
Tao B. Schardl
Peter Scheibel
Graham Schelle
Robert Schenck
Gundolf Schenk
Philip Schielke
Benjamin Schlueter
Marc Schlütter
Anna Schmedding
Perry Schmidt
Kevin Schneider
Nadav Schneider
Timo Schneider
William Schonbein
Richard Arnoud Schoonhoven
Martin Schreiber
Joseph Schuchart
Thomas C. Schulthess
Karl Schulz
Karl Schulz
Laura Schulz
Martin Schulz
Richard Schulze
Catherine Schuman
Benjamin Schwaller
Peter Schwartz
Nicholas Schwarz
Simon Schwitanski
Alessio Sclocco
Thomas R. W. Scogland
Steve Scott
Olga Scrivner
William R. Scullin
Robert Sears
Michael Seaton
Janet Sebastian
Saba Sehrish
Tyler Sellers
Oguz Selvitopi
David Semeraro
Hermes Senger
Shubhabrata Sengupta
Benjamin Sepanski
Alexander Serebrenik
Marc Sergent
Robert E. Settlage
Brad Settlemyer
Jean M. Sexton
Igor Sfiligoi
Igor Sfiligoi
Aamir Shafi
Justin Shafner
Milan Shah
Niteya Shah
Hafsah Shahzad
Aiman Shaikh
Gilad Shainer
John Shalf
Pavel Shamis
Nathan Shammah
Honghui Shang
Mallikarjun (Arjun) Shankar
Jiaping Shao
Mingtian Shao
Puneet Sharma
Ruslan Shaydulin
Stacey Sheldon
Li Shen
Xipeng Shen
Sameer Shende
Yongning Sheng
Kevin Sheridan
Ben Sherman
Jiuchen Shi
Yongmei Shi
Masaaki Shimizu
Woong Shin
Galen Shipman
Sultan Shoaib
Fumiyoshi Shoji
Ahmedur Rahman Shovon
Sergei Shudler
Chaoyang Shui
Maulik Shukla
Julian Shun
Min Si
Jeremy Siadal
SHAHZEB SIDDIQUI
Eva Siegmann
Bálint Siklósi
Anna Sikora
Alan Sill
Daniel Silver
Francesco Silvestri
Caitlin Sim
Derek Simmel
Christopher Simmons
James Simmons
Horst Simon
Amber Simpson
Matthew D Sinclair
Srinivas Yadav Singanaboina
Anju Singh
Siddharth Singh
Aviraj Sinha
Prasoon Sinha
Raül Sirvent
Ramakrishnan Sivakumar
Seyon Sivarajah
Anthony Skjellum
Kateřina Slaninová
Elliott Slaughter
Patrick Sliwinski
Adrian Small
Simon Smart
Melissa C. Smith
Shaden Smith
Spencer Smith
Winona Snapp-Childs
Addison Snell
Yaniv Snir
Calum Snowdon
Lucas Snyder
Shane Snyder
Mathias Soeken
Andrew Solis
Richard Solomon
Seung Woo Son
Guoli Song
Shuaiwen Leon Song
Ying Song
Zeyu Song
Saeed Soori
Masha Sosonkina
Alexandre Sousa
Wyatt Spear
Gil Speyer
Filippo Spiga
Hanno Spreeuw
Piyawut Srichaikul
Sriram Srinivasan
Shankaran Sriram
Ajitesh Srivastava
Ankur Srivastava
Tom St. John
Eric Stahlberg
Alexandros Stamatakis
Dan Stanzione
Dan C Stanzione
Matt Starr
Korbinian Staudacher
Stefan Stefan Kerkemeier
Sebastian Steiner
Thomas Steinke
Nathaniel T. Stemen
Sean Stephens
Savannah Stephenson
Laurie A. Stephey
Rick Stevens
Adam J. Stewart
Alexandra Stewart
Claire Stirm
Martin Stoll
Harmen Stoppels
Quentin F. Stout
Petr Strakoš
Magnus Strengert
Erich Strohmaier
Michelle Strout
Ryan Stutsman
Christodoulos Stylianou
Zhaoyuan Su
Alejandro M. Suarez
Estela Suarez
Hari Subramoni
Joshua Suetterlein
Jan-Friedrich Suhrmann
Dhruv Sujatha
Nitin Sukhija
Dalal Sukkari
Cesar Sul
Baixi Sun
Jingchao Sun
Ninghui Sun
Qingxiao Sun
Xian-He Sun
Xiaoyang Sun
Ying Sun
Hari Sundar
Shiv Sundram
Supreeth Suresh
Frédéric Suter
Toyotaro Suzumura
Gert Svensson
Sriram Swaminarayan
Tyler Swann
Steven Swanson
Christine Sweeney
Paolo Sylos Labini
Gábor Szederkényi
Jakub Tětek
Srinivas C Tadepalli
Claude Tadonki
Doaa Taha
Zaid Tahir
Ryousei Takano
Hikaru Takayashiki
Shinichiro Takizawa
Nathan Tallent
Guangming Tan
Jia Qing Tan
Houjun Tang
Meng Tang
Yu-Hang Tang
Zhe Tang
Muhammad Tanvir
Dingwen Tao
Konstantin Taranov
Ahmad Tarraf
James Tau
Michela Taufer
Ivano Tavernelli
Kheng Tiong Tay
John Taylor
Sam Taylor
Valerie Taylor
Alain Tchana
Roselyne Tchoua
Mary Tedeschi
Ali Tehrani Jamsaz
Antonio Teijeiro
Mohit Tekriwal
Chetan Tekur
Goran Temelkov
Keita Teranishi
Keita Teranishi
Christian Terboven
Xavier Teruel
Olivier Terzo
Francois Tessier
Serges Love Teutu Talla
Vijay Thakkar
Tarak Thakore
Rajeev Thakur
Mathialakan Thavappiragasam
Ryan Theriot
Hannes Thiemann
William W. Thigpen
George K. Thiruvathukal
Mary Thomas
Rollin Thomas
Neil Thompson
Mitchell Thornton
Peter E. Thornton
Jiannan Tian
Min Tian
Shilei Tian
Yonghong Tian
HSU-TZU TING
Jesmin Jahan Tithi
Devesh Tiwari
Emma Tolley
Dmitrii Tolmachev
Karen Tomko
Matthew P Tomlinson
Stanimire Z. Tomov
Felix Tomski
Kálmán Tornai
Yuri Torres
Georgia Tourassi
Brian Towles
Robert Tracey
Brandon Tran
Nhan Tran
Matthew Trappett
Strahinja Trecakov
Huy Trinh
Ana Trisovic
Christian Trott
Lukas Trümper
Karen Tsai
Timothy Tsai
Yaohung Mike Tsai
Yu-Hsiang Tsai
Aristeidis Tsaris
Ronny Tschueter
Alexander Tsyplikhin
Antonino Tumeo
Ashley Tung
Matteo Turilli
Terece Turton
Nicholas Tyler
Gideon Uchehara
Naonori Ueda
Uwe Ulbrich
Dominik Ulmer
Mariam Umar
Didem Unat
Robert R. Underwood
Osman Unsal
Ramakrishna Upadrasta
Alex Upton
Dante Uriostegui
Daniela Ushizima
Tetsuzo Usui
Arash Vahdat
Karan Vahi
Ilmari Vahteristo
Jaideep Vaidya
Edward Valeev
Pedro Valero-Lara
Pedro Valero-Lara
Ruud van der Pas
Brian C. Van Essen
Ben van Werkhoven
Charlie Vanaret
Tom Vander Aa
Tristan Vanderbruggen
Jackson Vanover
Ana Lucia Varbanescu
Tom Richard Vargis
Vas Vasiliadis
Natalia Vassilieva
Jean-Luc Vay
Sudharshan S. Vazhkudai
Bram Veenboer
Marc-André Vef
Flavio Vella
Radha Venkatagiri
S. VenkataKeerthy
Shivaram Venkataraman
Jyothi Venkatesh
Wilfried Verachtert
Mathieu Verite
Jeffrey S. Vetter
Felipe Viana
Tom Vierjahn
Henri Vincenti
Venkatram Vishwanath
Anke Visser
Ryan Vogt
Barr von Oehsen
Kirill Voronin
Richard Vuduc
Mark Wade
Mohamed Wahib
Misty Wahl
Jacob Wahlgren
Genna Waldvogel
Steve Wallach
Bingzhen Wang
Bo Wang
Cong Wang
Dali Wang
Di Wang
Fei Wang
Feiyi Wang
Haiyan Wang
Haojie Wang
Haoliang Wang
Jialei Wang
Jinzhen Wang
Kai Wang
Lesi Wang
Lillian Wang
Lixin Wang
Liyi Wang
Mingxuan Wang
Mingxun Wang
Ruibo Wang
Ruibo Wang
Ruisi Wang
Wei Wang
Xian Wang
Xiao Wang
Xiaohui Wang
Xiaolin Wang
Xin Wang
Yang Wang
Yida Wang
Yinshan Wang
Yong Wang
Yu Wang
Yuanwei Wang
YUKE WANG
Zhe Wang
Zheng Wang
Zheng Wang
Zhenlin Wang
Zhennan Wang
Zhong Wang
Zhuoya Wang
Tim Warburton
Joseph Ward
Logan Ward
Mina Warnet
Yutaka WATANABE
Curran Watson
Vincent Weaver
Nils Wedi
Aaron Weeden
Yang Wei
Yihua Wei
Zhewei Wei
Klaus Weide
Mathias Weiden
Josef Weidendorfer
Michèle Weiland
Marion Weinzierl
Greg Weirs
Rickey Weisner
Brent Welch
Glenn Wellbrock
Gerhard Wellein
Garth Wells
Jack Wells
Wendelin Wemhoener
Kelly Keene Werner
Bert Wesarg
Richard West
Manuel Wetzel
Laurent White
Sam White
Tobias Wicky
Stefan Wild
Torsten Wilde
Grant Wilkins
Finn Wilkinson
Sean Wilkinson
Sebastian Wilkinson
Mario Wille
Allan Williams
Bill Williams
Samuel W. Williams
George Williamson
Boyd Wilson
Bruce Wilson
Ellis Wilson
Steven JE Wilton
Frank Winkler
John Wofford
Lowell Wofford
Felix Wolf
Phillip Wolfram
Bryan Wong
Michael Wong
Michael Wong
Mark Woodhouse
Justin Wozniak
Jack Wright
Less P. Wright
Nicholas J. Wright
Steven A. Wright
Bo Wu
Elynn Wu
Hongyi Wu
PANG-NING WU
Weiqi Wu
Wenhao Wu
Wentiao Wu
Xin-Chuan Wu
Xingfu Wu
Yidi Wu
Yongwei Wu
You Wu
Roel Wuyts
Brandon Wyatt
Bryant Wyatt
Brian J. N. Wylie
Frank Würthwein
Wen Xia
Sijie Xiang
Chaowei Xiao
Junmin Xiao
Qian Xiao
Wencong Xiao
Bing Xie
Lei Xie
Qipeng Xie
Zhen Xie
Ying Xiong
David Xu
Erci Xu
Fan Xu
Jie Xu
Shaojun Xu
Wei Xu
Zhiqian Xu
HaoNan Xue
Qing Xue
Rohan Yadav
Yoshiaki Yamaoka
Ichitaro Yamazaki
Qiang Yan
Yineng Yan
Yonghong Yan
Chao Yang
Chao Yang
Dongxu Yang
Hailong Yang
Han Yang
Jinlong Yang
Lishan Yang
Mingjie Yang
Renyu Yang
Soonyeal Yang
Tianxing Yang
Xin Yang
Yuling Yang
Zihan Yang
Howard Yanxon
Zhiming Yao
Asim YarKhan
Reza Yazdani
Yang Ye
Thomas Yeh
Katherine Yelick
Jae-Seung Yeom
Izzet Yildirim
Orcun Yildiz
Junqi Yin
Shu Yin
Wanwang Yin
Yiran Yin
Rio Yokota
Bozhi You
Yang You
Jeff Young
Jeffrey Young
Steven R. Young
Andrew Younge
Ed Younis
Fan Yu
Lechen Yu
Linxiao Yu
Minlan Yu
Sixing Yu
Fengming Yuan
Eduardo Zaborowski
Anissa Zacharias
Mohammad Zaeed
Farid Zakaria
Rohit Zambre
Neil Zaïm
Matthew J. Zekauskas
Stephanie Zeller
Deze Zeng
Lingfang Zeng
Will Zeng
William Zeng
Michael Zentner
Jidong Zhai
Bin Zhang
Chen Zhang
Di Zhang
Feng Zhang
Gongrui Zhang
Jian Zhang
Jinghui Zhang
Jixiao Zhang
Minjia Zhang
Pengmiao Zhang
Raymond Zhang
Ruizhe Zhang
SiWei Zhang
Wei Zhang
Wei Zhang
Weiqun Zhang
Wusheng Zhang
Xiao Zhang
Xiaoyang Zhang
YangLin Zhang
Yi Zhang
Yichi Zhang
Yiming Zhang
Yining Zhang
Yu Zhang
Yucheng Zhang
Yunquan Zhang
Dongfang Zhao
Kai Zhao
Konghao Zhao
Liang Zhao
Meijia Zhao
Tuowen Zhao
Xuncheng Zhao
Yunjian Zhao
Zhengji Zhao
Zhuowen Zhao
Elton Zheng
Long Zheng
Ningxin Zheng
Qing Zheng
Tengyang Zheng
Yingwei Zheng
Shaojie Zhong
Amelie Chi Zhou
Bin Zhou
BinBin Zhou
Hui Zhou
Peng Zhou
Qihui Zhou
Shen Zhou
Wenhao Zhou
Jianan Zhu
Shun an Zhu
Xinran Zhu
Felix Zilk
Christopher Zimmer
Michael Zink
Alexandos Nikolaos Ziogas
Yosef Zlochower
Changfeng Zou
Xiangyu Zou
Kfir Zvi
Max Zvyagin
Petrus H. Zwart
Matias Zwinger
Işıl Öz

Presentations

Posters
Scientific Visualization & Data Analytics Showcase
"Atlas of a Changing Earth" Visualization of the ArcticDEM Survey and Vavilov Ice Cap Collapse
TP
XO/EX
DescriptionThe Advanced Visualization Lab at the National Center for Supercomputing Applications created a cinematic scientific visualization of the ArcticDEM survey and Vavilov ice cap collapse for the documentary film "Atlas of a Changing Earth", in both digital fulldome and flatscreen television formats. While the ArcticDEM dataset is the main one featured here, this visualization fills in gaps using other datasets, including a climate simulation by Bates et al and Landsat imagery. The visualization required a number of steps including: both manual and algorithmic data cleaning, processing, and alignment; data fusion; virtual scene design; morphing interpolation; lighting design; camera choreography; compositing; and rendering on the Blue Waters supercomputer.
ACM Gordon Bell Finalist
Awards Presentation
2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT
Recorded
Awards
TP
DescriptionOver the past three decades, ab initio electronic structure calculations of large, complex and metallic systems are limited to tens of thousands of atoms in computational accuracy and efficiency on leadership supercomputers. We present a massively parallel discontinuous Galerkin density functional theory (DGDFT) implementation, which adopts adaptive local basis functions to discretize the Kohn-Sham equation, resulting in a block-sparse Hamiltonian matrix. A highly efficient pole expansion and selected inversion (PEXSI) sparse direct solver is implemented in DGDFT to achieve O(N1.5) scaling for quasi two-dimensional systems. DGDFT allows us to compute the electronic structures of complex metallic heterostructures with 2.5 million atoms (17.2 million electrons) using 35.9 million cores on the new Sunway supercomputer. The peak performance of PEXSI can achieve 64 PFLOPS (5% of theoretical peak), which is unprecedented for sparse direct solvers. This accomplishment paves the way for quantum mechanical simulations into mesoscopic scale for designing next-generation electronic devices.
Awards Presentation
2022 ACM/IEEE-CS Ken Kennedy Award
Recorded
Awards
TP
DescriptionLinking scientific instruments and computation: Patterns, technologies, experiences

Powerful detectors at modern experimental facilities that collect data at multiple GB/s require online computing to process the resulting data flows. I review common patterns associated with such online analyses, and present new methods for configuring and running the resulting distributed computing pipelines. I present experiences with the application of these methods to the processing of data from five scientific instruments, each of which engages powerful computers for data inversion, model training, or other purposes. I also discuss implications of such methods for operators and users of scientific facilities.
Awards Presentation
2022 IEEE Sidney Fernbach Award
Recorded
Awards
TP
DescriptionFrom two strong oxen to billions of fleas: orchestrating
computation and data in modern high-performance computing


Following Sidney Fernbach's legacy, we will explore how massively parallel distributed supercomputers are designed, programmed, and operated today. We focus on the aspects of distributed-memory parallelism using Remote Direct Memory Access through the Message Passing Interface. We will close with an outlook of where technology will leads us and new problems for the HPC community to tackle in the coming years.
Awards Presentation
2022 IEEE-CS Seymour Cray Computer Engineering Award
Recorded
Awards
TP
DescriptionQuotes from Seymour Cray—Are we living up to his legacy?

Seymour Cray, often regarded as the “father of supercomputing”, endowed us with valuable quotes during his stellar career, and many of those quotes can now be found online. One can say that the HPC in general has made massive progress since his unfortunate passing, but the question is, has HPC made advances in a way that lives up to his ideals and his legacy, and furthermore is properly moving forward? Moreover, it is difficult even for a genius to predict the future accurately, and as such, are the ideals from his quotes living up to the present day HPC? We review his quotes against some historical supercomputing developments I have been involved in to address these questions.
Birds of a Feather
25th Graph500 List
TP
XO/EX
DescriptionData intensive supercomputer applications are increasingly important workloads, especially for “Big Data” problems, but are ill suited for most of today’s computing platforms (at any scale!). The Graph500 list has grown to over 328 entries and has demonstrated the challenges of even simple analytics. The new SSSP kernel introduced at SC17 has increased the benchmark’s overall difficulty. This BoF will unveil the latest Graph500 lists, provide in-depth analysis of the kernels and machines, and enhance the new energy metrics the Green Graph500. It will offer a forum for community and provide a rallying point for data intensive supercomputing problems.
Student Cluster Competition
2MuchCache
TP
XO/EX
DescriptionThe SDSC/UCSD SCC22 team is enthusiasm-driven, technically capable, fast-learning, and deeply experienced across the computer hardware and software stacks. Each team member is uniquely qualified and committed to using HPC to advance their field. We have one returning team member from the SCC21 virtual cluster competition, one team member graduating from previous competition training to the competition team, three former team members serving as team mentors, and four new students joining the competition team. We are confident in our team’s ability to tackle expected and unexpected challenges in the competition, using a combination of rigorous preparation, strong communication, robust planning, detailed learning, and efficient teamwork. Our team training activities are fully supported by SDSC through the HPC Students Program, and we are engaging directly with each of our sponsors for expert sessions on computer architecture, optimizing compilers, HPC in the cloud, containerization, and more.

Our team members exploit the full flexibility of the UCSD computer science, cognitive science, and computer engineering majors. Our technical stack includes: major programming languages (C, C++, Java, Python, Fortran), system administration, firmware engineering, parallel programming (MPI, OpenMP, CUDA), hardware design (SystemVerilog, tcl, Cadence, Synopsis), scientific applications (LAMMPS, Quantum Espresso, Avogadro, VMD), full stack web development (Node.JS, REACT, HTML), scripting and batch processing, and machine learning. Many team members have both undergraduate research and industry internship experience.

Edward Burns previously interned at SDSC, and he brings image processing, software engineering, and batch scheduler optimization experience to the team. He hopes that HPC experience will help him build highly scalable computer vision software throughout his career.

Davit Margarian brings a VLSI chip design and firmware background to the team. He hopes to use his HPC experience to accelerate computer-aided design tools for billion-gate integrated circuits.

Stefanie Dao is experienced across operating systems, computer vision, and high-performance software. She plans to apply her HPC experience to server-side processing and updating of augmented reality experiences in real time.

Longtian Bao has strong scripting, software engineering, and web development skills, and he participated in last year’s team training. He is excited to apply his skills to resource budgeting and performance monitoring during the competition.

Yuchen Jing has extensive networking and Linux system administration experience from hosting network proxies, file transfer servers, and version control systems. He is looking forward to strengthening his skills in developing, deploying, and maintaining high performance software.

Matthew Mikhailov competed at SCC21, and is the go-to person for the team. He specializes in VLSI chip design and computational materials science, and he uses the LAMMPS code for his research. He hopes to learn from his SCC experience to design the next generation of supercomputer chips.

Team advisor, Dr. Mary Thomas, SDSC HPC Training Lead, holds degrees in physics, computer science, and computational science, and taught parallel computing for 16 years. She has a personal commitment to the SCC program -- she has led 4 teams: SCC16 and 17 (San Diego State University) and SCC20-21 (UCSD). Her enthusiasm, knowledge, and practical experience will benefit the team.
Posters
Research Posters
A Bayesian Optimization-Assisted, High-Performance Simulator for Modeling RF Accelerator Cavities
TP
XO/EX
DescriptionRadio-frequency cavities are key components for high-energy particle accelerators, quantum computing, etc. Designing cavities comes along with many computational challenges such as multi-objective optimization, high performance computing (HPC) requirement for handling large-sized cavities etc. To be more precise, its multi-objective optimization requires an efficient 3D full-wave electromagnetic simulator. For which, we rely on the integral equation (IE) method and it requires fast solver with HPC and ML algorithms to search for resonance modes.

We propose an HPC-based fast direct matrix solver for IE, combined with hybrid optimization algorithms to attain an efficient simulator for accelerator cavity modeling. First, we solve the linear eigen problem for each trial frequency by a distributed-memory parallel, fast direct solver. Second, we propose the combination of the global optimizer Gaussian Process with the local optimizer Downhill-simplex methods to generate the trial frequency samples which successfully optimize the corresponding 1D objective function with multiple sharp minimums.
Posters
Research Posters
A C++20 Interface for MPI 4.0
TP
XO/EX
DescriptionWe present a modern C++20 interface for MPI 4.0. The interface utilizes recent language features to ease development of MPI applications. An aggregate reflection system enables generation of MPI data types from user-defined classes automatically. Immediate and persistent operations are mapped to futures, which can be chained to describe sequential asynchronous operations and task graphs in a concise way. This work introduces the prominent features of the interface with examples. We further measure its performance overhead with respect to the raw C interface.
Workshop
A Case Study on Coupling OpenFOAM with Different Machine Learning Frameworks
Recorded
W
DescriptionIn High-Performance Computing, new use cases are currently emerging in which classical numerical simulations are coupled with machine learning as a surrogate for complex physical models that are expensive to compute. In the context of simulating reactive thermo-fluid systems, the idea to replace current state-of-the-art tabulated chemistry with machine learning
inference is an active field of research. For this purpose, a simplified OpenFOAM application is coupled with an artificial neural network. In this work, we present a case study focusing solely on the performance of the coupled OpenFOAM-ML application. Our coupling approach features a heterogeneous cluster architecture combining pure CPU nodes and nodes equipped with two Nvidia V100 GPUs. We evaluate our approach by comparing the inference performance and the communication our approach induces with various machine learning frameworks. Additionally,
we also compare the GPUs with NEC Vector Engine Type 10B regarding inference performance.
Workshop
A Compiler for Universal Photonic Quantum Computers
Recorded
Quantum Computing
W
DescriptionPhotons are natural resources in quantum information, and the last decade showed significant progress in high-quality single photon generation and detection. Furthermore, photonic qubits are easy to manipulate and do not require particularly strongly sealed environments, making them an appealing platform for QC. With the one-way model, the vision of universal and large-scale QCs based on photonics becomes feasible. In one-way computing, the input state is not an initial product state |0>^n, but a so-called cluster state. A series of measurements on the cluster state's individual qubits and their temporal order, together with a feed-forward procedure, determine the quantum circuit to be executed. We propose a pipeline to convert a QASM circuit into a graph representation named measurement-graph (m-graph), that can be directly translated to hardware instructions on an optical one-way QC. Additionally, we optimize the graph using ZX-Calculus before evaluating the execution on an experimental discrete variable photonic platform.
Workshop
A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads
Recorded
Applications
Architectures
Benchmarking
Exascale Computing
Modeling and Simulation
Performance
Performance Portability
W
DescriptionScientific applications are increasingly adopting Artificial Intelligence (AI) techniques to advance science. High-performance computing centers are evaluating emerging novel hardware accelerators to efficiently run AI-driven science applications. With a wide diversity in the hardware architectures and software stacks of these systems, it is challenging to understand how these accelerators perform. The state-of-the-art in the evaluation of deep learning workloads primarily focuses on CPUs and GPUs. In this paper, we present an overview of dataflow-based novel AI accelerators from SambaNova, Cerebras, Graphcore, and Groq.

We present a first-of-a-kind evaluation of these accelerators with a diverse set of workloads, such as deep learning (DL) primitives, benchmark models, and scientific machine learning applications. We also evaluate the performance of collective communication, which is key for distributed DL implementation, along with a study of scaling efficiency. We then discuss key insights, challenges, and opportunities in integrating these novel AI accelerators in supercomputing systems.
Invited Talk
A Convergence of Complexities in Climate Systems and the Role of High-Performance Computing
Recorded
TP
XO/EX
DescriptionPredictive understanding and actionable insights in sustainability in the modern era requires an effective blend of theory and data-driven sciences. Relevant theory include physics, biogeochemistry, and ecology within the natural sciences, and engineering principles, economics, social, and governance principles in human-engineered systems and the social sciences. The data-driven sciences need to consider Big Data such as from archived numerical model simulations along with remotely sensed observations, and relatively small data such as from historical observations or even prehistorical proxy records, as well as based on prior domain knowledge and lessons learned from rare events and extremes. The underlying spatiotemporal data generation processes may be nonlinear dynamical, even chaotic, while the variability may be low frequency, even 1/f noise. Data may be sparse or incomplete, prior knowledge and physics may be incomplete or over-parameterized, while falsifiability and comprehensive uncertainty characterization are critical to inform decisions and add to our collective knowledge. Understanding the implications for domain-aware high performance computing may be critical both for the sciences and engineering and for investments or research directions in supercomputing. The first part of the presentation will describe these challenges and discuss how next-generation Artificial Intelligence may be able to provide solutions and where further developments may be necessary. The second part of the presentation will discuss recent research at my Sustainability and Data Sciences Laboratory, specifically, on the impacts of climate variability and weather extremes in ecology and biodiversity and in urban or regional critical lifeline infastructures, with an emphasis on the associated challenges and opportunities in processing earth science data.
Doctoral Showcase
Posters
A Data-Centric Optimization Workflow for the Python Language
TP
XO/EX
DescriptionPython's extensive software ecosystem leads to high productivity, rendering it the language of choice for scientific computing. However, executing Python code is often slow or impossible in emerging architectures and accelerators. To complement Python's productivity with the performance and portability required in high-performance computing (HPC), we introduce a workflow based on data-centric (DaCe) parallel programming. Python code with HPC-oriented extensions is parsed into a dataflow-based intermediate representation, facilitating analysis of the program's data movement. The representation is optimized via graph transformations driven by the users, performance models, and automatic heuristics. Subsequently, hardware-specific code is generated for supported architectures, including CPU, GPU, and FPGA. We evaluate the above workflow through three case studies. First, to compare our work to other Python-accelerating solutions, we introduce NPBench, a collection of over 50 Python microbenchmarks across a wide range of scientific domains. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer. DaCe runs 10x faster than the reference Python execution and achieves 2.47x and 3.75x speedups over previous-best solutions and up to 93.16% scaling efficiency. Second, we re-implement in Python and optimize the Quantum Transport Simulator OMEN. The application's DaCe version executes one to two orders of magnitude faster than the original code written in C++, achieving 42.55% of the Summit supercomputer's peak performance. Last, we utilize our workflow to build Deinsum, an automated framework for distributed multilinear algebra computations expressed in Einstein notation. Deinsum performs up to 19x faster over state-of-the-art solutions on the Piz Daint supercomputer.
Workshop
A Decade of Performance Portability: Lessons Learned
Recorded
W
Workshop
A Domain-Specific Provenance Query Composition Environment for Scientific Workflows
Recorded
Cloud and Distributed Computing
In Situ Processing
Scientific Computing
Workflows
W
DescriptionScientific Workflow Management Systems (SWfMS) systematically capture and store diverse provenance information at various phases. Scientists compose multitude of queries on this information. The support of integrated query composition and visualization in existing SWfMS is limited. Most systems do not support any custom query composition. VisTrails and Taverna introduced custom query languages vtPQL and TriQL to support limited workflow monitoring. Galaxy only tracks histories of operations and displays in lists. No SWfMS supports a scientist-friendly user interface for provenance query composition and visualization. We propose a domain-specific composition environment for provenance query of scientific workflows. As a proof of concept, we developed a provenance system for bioinformatics workflow management system and evaluated it in multiple dimensions, one for measuring the subjective perception of participants on the usability of it using NASA-TLX and SUS survey instruments and the other for measuring the flexibility through plugin integration using NASA-TLX.
Workshop
A DPU Solution for Container Overlay Networks
Recorded
W
DescriptionThere is an increasing demand to incorporate hybrid environments as part of workflows across edge, cloud, and HPC systems. In a such converging environment of cloud and HPC, containers are starting to play a more prominent role, bringing their networking infrastructure along with them. However, the current body of work shows that container overlay networks, which are often used to connect containers across physical hosts, are ill-suited for the HPC environment. They tend to impose significant overhead and noise, resulting in degraded performance and disturbance to co-processes on the same host.

This presentation focuses on utilizing a novel class of hardware, Data Processing Unit, to offload the networking stack of overlay networks away from the host onto the DPU. We intend to show that such ancillary offload is possible and that it will result in decreased overhead on host nodes which in turn will improve the performance of running processes.
Workshop
A First Step Toward Support for MPI Partitioned Communication on SYCL-Programmed FPGAs
Recorded
W
DescriptionVersion 4.0 of the Message Passing Interface standard introduced the concept of Partitioned Communication, which adds support for multiple contributions to a communication buffer. Although initially targeted at multithreaded MPI applications, Partitioned Communication currently receives attraction in the context of accelerators, especially GPUs. In this publication, it is demonstrated that this communication concept can be implemented for SYCL-programmed FPGAs. This includes a discussion of the design space and the presentation of a prototype implementation. Experimental results show that a lightweight implementation on top of an existing MPI library is possible. The presented approach also reveals issues in both the SYCL and the MPI standard, which needs to be addressed for improved support for the intended communication style.
Workshop
A Generalized Tumor Segmentation Algorithm for Varying Breast Cancer Subtypes
Recorded
W
DescriptionBackground. Automated breast tumor segmentation for dynamic contrast-enhanced magnetic resonance (DCE-MR) is a crucial step to advance and help with the implementation of radiomics for image-based, quantitative assessment of breast tumors and cancer phenotyping. Current studies focus on developing tumor segmentation, which often requires initial seed points from expert radiologists or atlas-based segmentation methods. We develop a robust, fully automated end-to-end segmentation pipeline for breast cancers on bilateral breast MR studies.

Methods. On IRB-approved diverse breast cancer MR cases, a deep learning segmentation algorithm was created and trained. The model’s backbone is UNet++, which consists of U-Nets of varying depths whose decoders are densely connected at the same resolution via the skip connections and all the constituent UNets are trained simultaneously to learn a shared image representation. This design not only improves the overall segmentation performance, but also enables model pruning during the inference time. The model was trained on the breast tumors located independently by a radiologist with consensus review by a second radiologist with at least five years of experience. MRI was performed using a 3.0-T imaging system in the prone position with a dedicated 16-channel breast coil and T1 weighted DEC-MR images were analyzed for the study. We used 80:20 random split for training and validation of the model.

Results. A total of 124 breast cancer patients had pre-treatment MR imaging before the start of NST - the cohort comprised 49 HR+HER2-, 37 HR+HER2+, 11 HR-HER2+, and 27 TNBC cases (mean tumor 2.3 cm (+/- 3.1mm).) The model was tested on 2571 individual images. Overall, the model scored 0.85 [0.84 – 0.86, 95% CI] dice score and 0.8[0.79-0.81, 95% CI] IoU score. TNBC tumors scored dice [0.88 – 0.89, 95% CI], HER2 neg and ER/PR positive dice [0.84-0.85, 95% CI] and HER2 positive dice [0.84-0.85, 95% CI]. We observed that model performed equally for the solid tumors and irregular shapes and didn’t observe any difference in the segmentation performance between residual and non-residual tumors types - dice score [0.85 – 0.86, 95% CI] and [0.83 – 0.84, 95% CI] respectively.

Conclusion. The proposed segmentation model can perform equally well on various clinical breast cancer subtypes. The model has high false positive rate towards biopsy clip and high background enhancement which we plan to solve by adding annotation of the clip and high non-cancer enhancement in future training data. We will release the trained model with open-source license to increase the scalability of the radiomics studies with fully automated segmentation. Given the importance of breast cancer subtypes as prognostic factors in women with operable breast cancer, automated segmentation of varying breast tumor subtypes will help to analyze imaging biomarkers embedded within the standard of care imaging studies in a larger scale study which will ¬potentially help radiologists, pathologists, surgeons, and clinicians understand features driving breast cancer phenotypes and pave the way for developing digital twin for breast cancer patients.
Workshop
A GPU Programming Lesson in the Pedagogical Style of the Carpentries
Recorded
W
DescriptionGraphics Processing Units are nowadays used to accelerate applications in multiple scientific domains, and is therefore necessary even for researchers outside of computer science to learn how to use them. However, traditional GPU programming courses are often aimed at people with a computer science or high-performance computing background.

To address this challenge we developed a GPU programming course, following the Carpentries pedagogical style, centered around live coding and the teaching of actionable skills. The course is open-source, freely available online in the Carpentries Incubator, and has been successfully taught both online and in-person.
Paper
A GPU-Accelerated AMR Solver for Gravitational Wave Propagation
Recorded
Applications
Computational Science
Scientific Computing
TP
DescriptionSimulations to calculate a single gravitational waveform (GW) can take several weeks. Yet, thousands of such simulations are needed for the detection and interpretation of gravitational waves. Future detectors will require even more accurate waveforms. Here we present the first large scale, adaptive mesh, multi-GPU numerical relativity (NR) code along with performance analysis and benchmarking. While comparisons are difficult to make, our GPU extension of the dendrogr~NR code achieves 6x speedup over existing state-of-the-art codes. We achieve 800 GFlops/s on a single NVIDIA A100 GPU with an overall 2.5x speedup over a two-socket, 128-core AMD EPYC 7763 CPU node with an equivalent CPU implementation. We present detailed performance analyses, parallel scalability results, and accuracy assessment for GWs computed for mass ratios q=1,2,4. We also present strong scalability up to 8 A100s and weak scaling up to 229,376 x86 cores on the Texas Advanced Computing Center's Frontera system.
Workshop
A High-Performance Design for Hierarchical Parallelism in the QMCPACK Monte Carlo Code
Recorded
Algorithms
Architectures
Compilers
Computational Science
Exascale Computing
Heterogeneous Systems
Hierarchical Parallelism
Memory Systems
Parallel Programming Languages and Models
Parallel Programming Systems
Resource Management and Scheduling
W
DescriptionWe introduce a new high-performance design for parallelism within the Quantum Monte Carlo code QMCPACK. We demonstrate that the new design is better able to exploit the hierarchical parallelism of heterogeneous architectures compared to the previous GPU implementation. The new version is able to achieve higher GPU occupancy via the new concept of crowds of Monte Carlo walkers, and by enabling more host CPU threads to effectively offload to the GPU. The higher performance is expected to be achieved independent of the underlying hardware, significantly improving developer productivity and reducing code maintenance costs. Scientific productivity is also improved with full support for fallback to CPU execution when GPU implementations are not available or CPU execution is more optimal.
Posters
Research Posters
A Holistic View of Memory Utilization on Perlmutter
TP
XO/EX
DescriptionHPC systems are at risk of being underutilized due to various resource requirements of applications and the imbalance of utilization among subsystems. This work provides a holistic analysis and view of memory utilization on a leadership computing facility, the Perlmutter system at NERSC, through which we gain insights about the resource usage patterns of the memory subsystem. The results of the analysis can help evaluate current system configurations, offer recommendations for future procurement, provide feedback to users on code efficiency, and motivate research in new architecture and system designs.
Posters
Research Posters
A Light-Weight and Unsupervised Method for Near Real-time Anomaly Detection Using Operational Data Measurement
TP
XO/EX
DescriptionMonitoring the status of large computing systems is essential to identify unexpected behavior and improve their performance and up-time. However, due to the large-scale and distributed design of such computing systems as well as a large number of monitoring parameters, automated monitoring methods should be applied. Such automatic monitoring methods should also have the ability to adapt themselves to the continuous changes in the computing system. In addition, they should be able to identify behavioral anomalies in useful time, in order to perform appropriate reactions. This work proposes a general light-weight and unsupervised method for near real-time anomaly detection using operational data measurement on large computing systems. The proposed model requires as low as 4 hours of data and 50 epochs for each training process to accurately resemble the behavioral pattern of computing systems.
Birds of a Feather
A Look into the Compute Express Link™ (CXL™) Device Ecosystem
TP
XO/EX
DescriptionCompute Express Link™ (CXL™) maintains memory coherency between the CPU memory space and memory on CXL attached devices. CXL enables a high-speed, efficient interconnect between the CPU, platform enhancements, and workload accelerators such as GPUs, FPGAs, and other purpose-built accelerator solutions.

This BoF session will feature a panel of experts from the CXL Consortium to discuss available CXL devices and what devices the industry can expect to see in the next year. The experts will also explore the new features in the CXL 3.0 specification and the new usage models it will enable.
Workshop
A Methodology for Evaluating Tightly-Integrated and Disaggregated Accelerated Architectures
Recorded
Applications
Architectures
Benchmarking
Exascale Computing
Modeling and Simulation
Performance
Performance Portability
W
DescriptionTighter integration of computational resources can foster superior application performance by mitigating communication bottlenecks. Unfortunately, not every application can use every compute or accelerator all the time. As a result, co-locating resources often leads to under-utilization of resources. In the next five years, HPC system architects will be presented with a spectrum of accelerated solutions ranging from tightly coupled, single package APUs to a sea of disaggregated GPUs interconnected by a global network. In this paper, we detail NEthing, our methodology and tool for evaluating the potential performance implications of such diverse architectural paradigms. We demonstrate our methodology on today’s and projected 2026 technologies for three distinct workloads: a compute-intensive kernel, a tightly-coupled HPC simulation, and an ensemble of loosely-coupled HPC simulations. Our results leverage NEthing to quantify the increased utilization disaggregated systems must achieve in order to match superior performance of APUs and on-board GPUs.
Posters
Research Posters
A Multifaceted Approach to Automated I/O Bottleneck Detection for HPC Workloads
TP
XO/EX
DescriptionReal-world HPC workloads impose a lot of pressure on storage systems as they are highly data dependent. On the other hand, as a result of recent developments in storage hardware, it is expected that the storage diversity in upcoming HPC systems will grow. This growing complexity in the storage system presents challenges to users, and often results in I/O bottlenecks due to inefficient usage. There have been several studies on reducing I/O bottlenecks. The earliest attempts worked to solve this problem by combining I/O characteristics with expert insight. The recent attempts rely on the performance analysis from the I/O characterization tools. However, the problem is multifaceted with many metrics to consider, hence difficult to do manually, even for experts. In this work, we develop a methodology that produces a multifaceted view of the I/O behavior of a workload to identify potential I/O bottlenecks automatically.
Workshop
A Perspective to Navigate the National Laboratory Environment for RSE Career Growth
Recorded
Career Development
Professional Development
Software Engineering
Workforce
W
DescriptionThis paper shares a perspective for the research software engineering (RSE) community to navigate the National Laboratory landscape. The RSE role is a recent concept that led to organizational challenges to place and evaluate their impact, costs and benefits. The premise is that RSEs are a natural fit into the current landscape and can use traditional career growth strategies in science: publications, community engagements and proposals. Projects funding RSEs can benefit from this synergy and be inclusive on traditional activities. Still, a great deal of introspection is needed to close gaps between the rapidly evolving RSE landscape and the well-established communication patterns in science. This perspective is built upon interactions in industry, academia and government in high-performance computing (HPC) environments. The goal is to contribute to the conversation around RSE career growth and understand their return on investment for scientific projects and sponsors.
Awards Presentation
Test of Time
A Power-Aware Run-Time System for High-Performance Computing
Recorded
Awards
TP
DescriptionFor decades, the high-performance computing (HPC) community has focused on performance, where performance is defined as speed. To achieve better performance per compute node, microprocessor vendors have not only doubled the number of transistors (and speed) every 18-24 months, but they have also doubled the power densities. Consequently, keeping a large-scale HPC system functioning properly requires continual cooling in a large machine room, thus resulting in substantial operational costs. Furthermore, the increase in power densities has led (in part) to a decrease in system reliability, thus leading to lost productivity.

To address these problems, we propose a power-aware algorithm that automatically and transparently adapts its voltage and frequency settings to achieve significant power reduction and energy savings with minimal impact on performance. Specifically, we leverage a commodity technology called “dynamic voltage and frequency scaling” to implement our power-aware algorithm in the run-time system of commodity HPC systems.
Workshop
A PSNR-Based Image Selection Approach Targeting Smart In Situ Visualization
Recorded
Accelerator-based Architectures
Data Analytics
In Situ Processing
Scientific Computing
Visualization
Workflows
W
DescriptionAlthough in situ visualization can reduce the amount of data written to the storage, in situ visualization can still generate large amount of data for subsequent analysis. For instance, from different viewpoints at every visualization time step. Considering that some of these images can be similar, an appropriate image selection to reduce the total number of images would be beneficial to minimize the analysis time for understanding the underlying simulation phenomena without missing important features. As an approach for such smart in situ visualization, we have worked on adaptive time step selection for skipping time steps with small amount of change between time steps. In this lightning talk, focusing on the set of images which can be generated from different viewpoints at every time step, we will present a PSNR-based image selection approach for eliminating similar images to further reduce the total number of images, targeting smarter in situ visualization.
Workshop
A Q# Implementation of a Quantum Lookup Table for Quantum Arithmetic Functions
Recorded
Quantum Computing
W
DescriptionWe present Q# implementations for arbitrary fixed-point arithmetic operations for a gate-based quantum computer based on lookup tables (LUT). In general, this is an inefficient way of implementing a function since the number of inputs can be large or even infinite. However, if the input domain can be bounded and there can be some error tolerance in the output (both of which are often the case in practical use-cases), the quantum LUT implementation of certain quantum arithmetic functions can be more efficient than their corresponding reversible arithmetic implementations. We discuss the implementation of the LUT using Q#, show examples of how to use the LUT to implement quantum arithmetic functions, and compare the resources required for the implementation with the current state-of-the-art bespoke implementations of exponential and Gaussian functions.
Workshop
A Research Software Engineering Development Path for Scientific Applications in Oil and Gas
Recorded
Career Development
Professional Development
Software Engineering
Workforce
W
DescriptionResearch Software Engineering (RSE) provides methodological tools to develop software to be deployed in High-Performance Computing (HPC) infrastructures, follow good practices, and achieve a good quality of software. Also, RSE supports actors involved in the development, from developers to users, including development, deployment, interaction, and training. The oil and gas community is one of the most critical contexts for scientific applications, from exploration to econometry and market analysis. Following RSE elements, we present a development path to build robust research software in this contribution.
Workshop
A Selective Nesting Approach for the Sparse Multi-Threaded Cholesky Factorization
Recorded
AI-HPC Convergence
Extreme Scale Computing
Parallel Programming Languages and Models
Performance
Runtime Systems
W
DescriptionSparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and definite positive matrices.

This presentation presents selective nesting, a method to determine the optimal task granularity for the parallel Cholesky factorization based on the structure of sparse matrices. We propose the OPT-D algorithm, which automatically and dynamically applies selective nesting. OPT-D leverages matrix sparsity to drive complex task-based parallel workloads in the context of direct solvers. We run an extensive evaluation campaign considering a heterogeneous set of 35 sparse matrices and a parallel machine featuring the A64FX processor. OPT-D delivers an average performance speedup of 1.46x with respect to the best state-of-the-art parallel method to run direct solvers.
Workshop
A Separated Model for Running Rootless, Unprivileged PMIx-Enabled HPC Applications in Kubernetes
Recorded
W
DescriptionHigh Performance Computing (HPC) applications must be containerized to run in a Kubernetes (K8s) environment. The traditional model for running HPC applications in a K8s environment requires the Application Container (APP) to include the runtime environment and the launch support mechanisms, in addition to the application. This requirement can increase the APP size and introduce security vulnerabilities. The separated model presented detaches the runtime from the APP. This allows the system administrators to define, maintain, and secure the Runtime Environment Container (REC). A PMIx library connects the APP and REC. The PMIx library serves as a runtime communication conduit for HPC parallel libraries (like MPI) to perform necessary functions like inter-process wire-up. The APP is nested within the REC using unprivileged, rootless Podman. The separated model is demonstrated by running a set of HPC applications in an off-the-shelf K8s system.
Paper
A Taxonomy of Error Sources in HPC I/O Machine Learning Models
Recorded
Reliability and Resiliency
TP
DescriptionI/O efficiency is crucial to productivity in scientific computing, but the growing complexity of HPC systems and applications complicates efforts to understand and optimize I/O behavior at scale. Data-driven machine learning-based I/O throughput models offer a solution: they can be used to identify bottlenecks, automate I/O tuning, or optimize job scheduling with minimal human intervention. Unfortunately, current state-of-the-art I/O models are not robust enough for production use and under-perform after being deployed.

We analyze four years of application, scheduler, and storage system logs on two leadership-class HPC platforms to understand why I/O models under-perform in practice. We propose a taxonomy consisting of five categories of I/O modeling errors: poor application and system modeling, inadequate dataset coverage, I/O contention, and I/O noise. We develop litmus tests to quantify each category, allowing researchers to narrow down failure modes, enhance I/O throughput models, and improve future generations of HPC logging and analysis tools.
Workshop
A Trigger-Based Approach for Optimizing Camera Placement Over Time
Recorded
Accelerator-based Architectures
Data Analytics
In Situ Processing
Scientific Computing
Visualization
Workflows
W
DescriptionWe contribute a new approach for in situ automation of camera placement over time. Our approach incorporates triggers, regularly evaluating the current camera placement and searching for a new camera placement when a trigger fires. We evaluate our approach running in situ with five data sets from two simulation codes, considering camera placement quality (evaluated using a viewpoint quality metric) and overhead (number of camera positions evaluated). We find that our approach has a significant – reduced overhead with similar quality – compared to the naive approach of searching for a new camera placement each cycle.
Posters
Research Posters
Accelerated COVID-19 CT Image Enhancement via Sparse Tensor Cores
TP
XO/EX
DescriptionIn this work we accelerate a target a deep learning model designed to enhance CT images of covid-19 chest scans namely DD-Net using sparse techniques. The model follows an auto encoder decoder architecture in deep learning paradigm and has high dimensionality and thus takes many compute hours of training. We propose a set of techniques which target these two aspects of model - dimensionality and training time. We will implement techniques to prune neurons making the model sparse and thus reduce the effective dimensionality with a loss of accuracy not more than 5% with minimal additional overhead of retraining. Then we propose set of techniques tailored with respect to underlying hardware in order to better utilize the existing components of hardware (such as tensor core) and thus reduce time and associated cost required to train this model.
Workshop
Accelerated Workflow for Advanced Kinetic Equilibria
Recorded
W
DescriptionKinetic equilibria are a fundamental aspect of tokamak plasma analysis, but are often highly specialized and labor intensive to produce. This has become a bottleneck to both deeper physics understandings and more sophisticated experiment controls. This project aims to remove these barriers by developing a rapid, fully-automated workflow to produce better-than-human, high-precision whole-discharge kinetic equilibria. The required elements in this workflow now exist separately, but what is missing is the coupling of different aspects and overall performance optimization. We have designed this workflow for the DIII-D national fusion facility with the goal of producing results quickly enough to be used for experiment planning in the 15-20 minute time window between subsequent discharges. The results will also be stored in a database for follow-up analysis and as the foundation for AI/ML surrogate models. Initial results suggest that it may be possible to achieve our goal within a target 10 minute window.
Workshop
Accelerating Data Serialization/Deserialization Protocols with In-Network Compute
Recorded
W
DescriptionEfficient data communication is a major goal for scalable and cost-effective use of datacenter and HPC system resources. To let applications communicate efficiently, exchanged data must be serialized at the source and deserialized at the destination. The serialization/deserialization process enables exchanging data in a language- and machine-independent format. However, serialization/deserialization overheads can negatively impact application performance. For example, a server within a microservice framework must deserialize all incoming requests before invoking the respective microservices. We show how data deserialization can be offloaded to fully programmable SmartNICs and performed on the data path, on a per-packet basis. This solution avoids intermediate memory copies, enabling on-the-fly deserialization. We showcase our approach by offloading Google Protocol Buffers, a widely used framework to serialize/deserialize data. We show through microservice throughput modeling how we can improve the overall throughput by pipelining the deserialization and actual application activities with PsPIN.
Workshop
Accelerating Datalog Applications with cuDF
Recorded
Accelerator-based Architectures
Algorithms
Architectures
Big Data
Data Analytics
Parallel Programming Languages and Models
Productivity Tools
W
DescriptionDatalog, a bottom-up declarative logic programming language, has a wide variety of uses for deduction, modeling, and data analysis, across application domains. Datalog can be efficiently implemented using relational algebra primitives such as join, projection and union. While, there exist several multi-threaded and multi-core implementations of Datalog that target CPU-based systems, our work makes an inroad towards developing a Datalog implementation for GPUs. We demonstrate the feasibility of a high performance relational algebra backend for a small subset of Datalog applications that can effectively leverage the parallelism of GPUs using cuDF. cuDF is a library from the Rapids suite that uses the NVIDIA CUDA programming model for GPU parallelism. It provides similar functionalities to Pandas, a popular data analysis engine. In this presentation, we analyze and evaluate the performance of cuDF versus Pandas for two graph mining problems implemented in Datalog, (1) triangles counting and (2) transitive closure computation.
Exhibitor Forum
Accelerating Discovery in Supercomputing Environments by Centralizing Multi-Cloud Data Management
Recorded
TP
XO/EX
DescriptionAs supercomputing infrastructures become increasingly distributed, centralizing the management of data that may span multiple data centers, public cloud providers, and edge locations is key to accelerating research. Whether organizations are looking to expand information sharing for science teams; enhance data management practices across collaborative platforms; or unlock access to cloud services for data practitioners, centralizing multi-cloud data management addresses these challenges by seamlessly integrating multiple public clouds and on-premises storage under a single namespace. Modern technologies improve the responsiveness of data workflows by supporting constant movement of data and applications across systems and automating data placement and lifecycle rules. This means that both on-premises and cloud applications can use the same data without negatively impacting performance. It also means that the right data is placed where and when it’s needed for the most effective, agile workflows. Finally, by synchronizing data across multiple cloud-based repositories, multi-cloud data management software enables data to be accessed independent of its physical location to eliminate vendor lock-in and minimize cloud egress fees.

Join us for this technical deep dive into how centralizing multi-cloud data management can maximize the value of cloud initiatives by creating new opportunities for collaboration and innovation across platforms and data-driven ecosystems.
Workshop
Accelerating Drug Discovery with AI Panel
Recorded
W
Paper
Accelerating Elliptic Curve Digital Signature Algorithms on GPUs
Recorded
Applications
Numerical Algorithms
Security
TP
DescriptionThe Elliptic Curve Digital Signature Algorithm (ECDSA) is an essential building block of various cryptographic protocols. In particular, most blockchain systems adopt it to ensure transaction integrity. However, due to its high computational intensity, ECDSA is often the performance bottleneck in blockchain transaction processing. Recent work has accelerated ECDSA algorithms on the CPU; in contrast, success has been limited on the GPU, which has great potential for parallelization but is challenging for implementing elliptic curve functions. In this paper, we propose RapidEC, a GPU-based ECDSA implementation for SM2, a popular elliptic curve. Specifically, we design architecture-aware parallel primitives for elliptic curve point operations, and parallelize the processing of a single SM2 request as well as batches of requests. Consequently, our GPU-based RapidEC outperformed the state-of-the-art CPU-based algorithm by orders of magnitude. Additionally, our GPU-based modular arithmetic functions as well as point operation primitives can be applied to other computation tasks.
Workshop
Accelerating Flash-X Simulations with Asynchronous I/O
Recorded
W
DescriptionMost high-fidelity physics simulation codes, such as Flash-X, need to save intermediate results (checkpoint files) to restart or gain insights into the evolution of the simulation. These simulation codes save such intermediate files synchronously, where computation is stalled while the data is written to storage. Depending on the problem size and computational requirements, this file write time can be a substantial portion of the total simulation time. In this paper, we evaluate the overheads and the overall benefit of asynchronous I/O in HDF5 to simulations. Results from real-world high-fidelity simulations on the Summit supercomputer show that I/O operation is overlapped with application communication or computation or both, effectively hiding some or all of the I/O latency. Our evaluation shows that while using asynchronous I/O adds overhead to the application, the I/O time reduction is more significant, resulting in overall up to 1.5X performance speedup
Workshop
Accelerating Kernel Ridge Regression with Conjugate Gradient Method for Large-Scale Data Using FPGA High-Level Synthesis
Recorded
W
DescriptionIn this work, we accelerate the Kernel Ridge Regression algorithm on an adaptive computing platform to achieve higher performance within faster development time by employing a design approach using high-level synthesis. In order to avoid storing the potentially huge kernel matrix in external memory, the designed accelerator computes the matrix on-the-fly in each iteration. Moreover, we overcome the memory bandwidth limitation by partitioning the kernel matrix into smaller tiles that are pre-fetched to small local memories and reused multiple times. The design is also parallelized and fully pipelined to accomplish the highest performance. The final accelerator can be used for any large-scale data without kernel matrix storage limitations and with an arbitrary number of features. This work is an important first step towards a library for accelerating different Kernel methods for Machine Learning applications for FPGA platforms that can be used conveniently from Python with a NumPy interface.
Paper
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5
Recorded
Data Mangement
Storage
TP
DescriptionLossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel write due to the lack of deep understanding on compression-write performance. To this end, we propose to deeply integrate predictive lossy compression with HDF5 to significantly improve parallel-write performance. Specifically, we propose analytical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping. We also introduce an extra space to handle the prediction uncertainty. Moreover, we propose an optimization to reorder the compression tasks to increase the overlapping efficiency. Experiments with up to 4,096 cores show that our solution improves the write performance by up to 4.5x and 2.9x over the non-compression and lossy compression solutions, respectively, with only 1.5% storage overhead (to original data) on two real-world applications.
Tutorial
Accelerating Portable HPC Applications with Standard C++
Recorded
Accelerator-based Architectures
Heterogeneous Systems
Parallel Programming Languages and Models
Performance Portability
Productivity Tools
Software Engineering
TUT
DescriptionThis half-day hands-on tutorial teaches how to accelerate HPC applications using the portable parallelism and concurrency features of the C++17 and C++20 standards, without any language or vendor extensions, such that a single version of the code is portable to multi-core CPU and to GPU systems. We further show how to integrate this approach with MPI to target CPU clusters and multi-GPU platforms. The tutorial exercises follow classical HPC themes like a PDE solver mini-application for the 2D unsteady heat equation. The exercises provide attendees with hands-on experience applying C++ parallel algorithms and execution policies to parallelize and accelerate HPC programs using only standard C++. The attendees are presented problem-solving strategies for common tasks like computing reductions or running iterative solvers for multi-dimensional problems. Furthermore, the tutorial and exercises give attendees hands-on experience in integrating C++ parallel algorithms into pre-existing MPI applications, teaching how to re-use the pre-existing MPI code to produce MPI/C++ applications that run on multi-CPU and multi-GPU systems. Finally, we conclude with a summary of our professional experience applying the ISO C++ parallel programming model to accelerate large real-world HPC applications and provide an outlook of future topics in C++ standard parallelism.
Workshop
Access to Computing Education Using Micro-Credentials for Cyberinfrastructure
Recorded
HPC Training and Education
W
DescriptionIn response to an increasing demand for digital skills in industry and academia, a series of credentialed short courses that cover a variety of topics related to high performance computing were designed and implemented to enable university students and researchers to effectively utilize research computing resources and bridge the gap for users with educational backgrounds that do not include computational training. The courses cover a diverse array of topics, including subjects in programming, cybersecurity, artificial intelligence/machine learning, bioinformatics, and cloud computing. The courses are designed to enable the students to apply the skills they learn to their own research that incorporates use of large-scale computing systems. These courses offer advantages to generic online courses in that they teach computing skills relevant to academic research programs. Finally, the micro-credentials are transcriptable, may be stacked with existing programs to create a larger degree plan, and add to a student’s resume.
Job Posting
Account Executive-Higher Education & Research
DescriptionWe are currently seeking an Account Executive - MIDWEST EDU. The Account Executive will work in the development of new HPC Higher Education and Research position will be field based and involve travel approximately 30%-40% of the time. The job involves managing a territory and growing business through their own experience, including previously established relationships; so a successful applicant must have strong industry contacts and a demonstrated success in personally closing business in the HPC & EDU space.

Responsibilities for this role include but are not limited to:
  • Create and maintain a customer pipeline, hitting revenue goals and growing the territory.
  • Lead and coordinate complex, team selling efforts (with internal and external partners).
  • Develop a strong understanding of the customers’ technology infrastructure, strategy and business requirements.
  • Partner with internal staff to create successful Proposals and Presentations in response to RFPs and other customer needs.
  • Attend trade shows and other activities to raise DDN’s presence in the industry.
  • Manage customer relationships post-sale; including a strategy to close repeat business.
  • Awards Presentation
    SC22 Opening Session & Turing Lecture
    ACM A.M. Turing Award Lecture
    Recorded
    Awards
    Keynote
    Turing
    TP
    W
    TUT
    XO/EX
    DescriptionJoin us for the 2021 ACM A.M. Turing Award Lecture featuring Jack Dongarra. A longtime SC supporter, Jack’s pioneering contributions to numerical algorithms and libraries that enabled HPC software to keep pace with exponential hardware improvements for over four decades has, through the years, accelerated HPC. With our SC22 conference theme, HPC Accelerates, we’re honored that Jack selected SC22 as the location to present his award lecture.

    Be sure to include the ACM A.M. Turing Lecture in your schedule when planning your SC22 conference experience. You won’t want to miss it! This lecture replaces our traditional keynote presentation.
    Paper
    AD for an Array Language with Nested Parallelism
    Recorded
    System Software
    TP
    DescriptionWe present a technique for applying reverse mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution.

    The key idea is to eliminate the need for a tape by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly nested scopes do not introduce re-execution and that such perfect nests can be readily produced by application of known compiler transformations. Our technique differentiates loops and bulk-parallel operators---e.g., map, reduce(-by-index), scan, and scatter---by specific rewrite rules and aggressively optimizes the resulting nested-parallel code. We report an evaluation that compares with established AD solutions and demonstrates competitive performance on ten common benchmarks from recent applied AD literature.
    Workshop
    Adding Malleability to MPI: Opportunities and Challenges
    Recorded
    AI-HPC Convergence
    Extreme Scale Computing
    Parallel Programming Languages and Models
    Performance
    Runtime Systems
    W
    DescriptionThe Message Passing Interface (MPI) is the most dominant programming model on HPC systems and has been instrumental in developing efficient, large scale parallel applications. However, it has a rather static view of compute resources building on top of the concept of immutable communicators. While this provides some easy-of-use and simplicity, it is limiting, in particular for modern workflow-based workloads as well as in its support for resource adaptive systems. The newly introduced concept of MPI Sessions, however, opens the door more dynamicity and adaptivity. In this talk I will highlight the opportunities that can arise from such directions and discuss a novel approaches we are pursuing as part of several EuroHPC projects. Our ultimate goal is to provide full malleability in MPI as well as the surrounding software layers - from system software to applications - and with that enable us to more efficiently harness the computational capabilities of current and future HPC systems.
    Workshop
    Additional Questions, Community Discussion, and Supply Chain Issues ...
    Recorded
    Benchmarking
    Cloud and Distributed Computing
    Containers
    Datacenter
    Networks
    Privacy
    Resource Management and Scheduling
    Security
    SIGHPC
    State of the Practice
    System Administration
    System Software
    W
    DescriptionAdditional Questions, Community Discussion, and Supply Chain Issues ...
    Birds of a Feather
    Addressing HPC's Carbon Footprint
    TP
    XO/EX
    DescriptionLast year's panel "HPC's Growing Sustainability Challenges and Emerging Approaches" gave an excellent introduction to the carbon impact of HPC along with ideas for carbon mitigation. This BoF we will focus on concrete actions that data center operators and users can undertake to reduce HPC's carbon footprint. These range from using more energy efficient processors, to improved cooling, extending the lifetime of computing equipment, shifting load from regions with carbon-intense electricity to regions where the vast majority of electricity comes from renewable resources. Pro's and cons of various will approaches will be discussed. Audience participation and ideas will be welcome.
    Paper
    Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers
    Recorded
    Applications
    Numerical Algorithms
    Security
    TP
    DescriptionSeveral scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are different in size. Matrix decompositions on such irregular workloads are rarely addressed on GPUs.

    This paper addresses irregular workloads of matrix computations on GPUs and shows their impact on a sparse LU solver. We designed an interface for the basic matrix operations supporting problems of different sizes. The interface enables us to develop irrLU-GPU, an LU decomposition on matrices of different sizes. We demonstrate the impact of irrLU-GPU on sparse LU solvers using NVIDIA and AMD GPUs. Experimental results are shown for a sparse direct solver based on multifrontal sparse LU decomposition applied to linear systems arising from the simulation, using finite element discretization on unstructured meshes, of a high frequency indefinite Maxwell problem.
    Tutorial
    ADIOS-2: A Framework to Enable HPC Tools for Extreme Scale I/O, in situ Visualization, and Performance Analysis
    Recorded
    Big Data
    Cloud and Distributed Computing
    Data Analytics
    Data Mangement
    Emerging Technologies
    Exascale Computing
    File Systems and I/O
    In Situ Processing
    Performance
    Productivity Tools
    Reliability and Resiliency
    Resource Management and Scheduling
    Software Engineering
    Visualization
    TUT
    DescriptionAs concurrency and complexity continue to increase on high-end machines, storage I/O performance is rapidly becoming a fundamental challenge to scientific discovery. At the exascale, online analysis will become a dominant form of data analytics, and thus scalable in situ workflows will become critical, along with high performance I/O to storage. The many components of a workflow running simultaneously pose another challenge of evaluating and improving the performance of these workflows. Therefore, performance data collection needs to be an integral part of the entire workflow.

    In this tutorial, we present ADIOS-2 which allows for building in situ and file-based data processing workflows for extreme scale systems, including interactive, on-demand, in situ visualization of the data, and including performance profiling of the entire workflow. Half of this tutorial will be hands-on sessions, where we provide access to the software, and build together a complete MiniApp with in situ analytics and performance analysis that users can run on their laptop and supercomputers at large scale. We will show how ADIOS-2 is fully integrated into three popular visualization and performance tools: Jupyter Notebook, ParaView, and TAU, creating a software ecosystem for in situ processing of both performance and scientific data.
    Workshop
    Adopting Heterogeneous Computing Modules: Experiences from a ToUCH Summer Workshop
    Recorded
    W
    DescriptionWe present efforts to encourage the adoption of modules for teaching heterogeneous parallel computing through a faculty development workshop. The workshop was held remotely using a novel format to exploit the advantages of a virtual format and mitigate its disadvantages. Adoption at a wide variety of institutions showed module effectiveness and also gathered feedback leading to several module improvements. We also report on the adoptions themselves, which show the importance of supporting adaptation of the modules for diverse settings.
    Job Posting
    Advanced Development Intern
    DescriptionWith over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA’s metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. The Film and Scatterometry Technology (FaST) Division provides industry leading metrology solutions for worldwide semiconductor IC manufacturers. The FaST Division portfolio of metrology products includes hardware and software solutions for optical film thickness, optical critical dimension (CD), composition, and resistivity measurement systems. These products are essential for the IC manufacturers as they provide critical metrology capabilities for the development and implementation of their advanced IC processes. The FaST division is committed to support our customers to achieve performance entitlement of our solution and we effectively partner with our customers from their early research and development phase to the high volume in-line manufacturing implementation specific for their process needs. The division consists of a global team located in US, Israel, China, and India.

    The major function of this role is to support investigations into novel metrology technologies. Advanced Development is responsible for both investigating and characterizing new technologies versus current best known methods. Major tasks include the collection of data using KLA tools and software packages, analysis and summary of results, automation of analysis routines.

    Responsibilities

    Perform guided analysis of KLA tools and software.
    Generate reports on data analysis and findings.
    Support data collection and analysis.
    Tutorial
    Advanced MPI Programming
    Recorded
    Algorithms
    Cloud and Distributed Computing
    Datacenter
    Parallel Programming Languages and Models
    Performance
    TUT
    DescriptionThe vast majority of production parallel scientific applications today use MPI and run successfully on the largest systems in the world. Parallel system architectures are evolving to include complex, heterogeneous nodes comprising general-purpose CPUs as well as accelerators such as GPUs. At the same time, the MPI standard itself is evolving to address the needs and challenges of future extreme-scale platforms as well as applications. This tutorial will cover several advanced features of MPI that can help users program modern systems effectively. Using code examples based on scenarios found in real applications, we will cover several topics including efficient ways of doing 2D and 3D stencil computation, derived datatypes, one-sided communication, hybrid programming (MPI + threads, shared memory, GPUs), topologies and topology mapping, neighborhood and nonblocking collectives, and some of the new performance-oriented features in MPI-4. Attendees will leave the tutorial with an understanding of how to use these advanced features of MPI and guidelines on how they might perform on different platforms and architectures.
    Tutorial
    Advanced OpenMP: Host Performance and 5.2 Features
    Recorded
    Accelerator-based Architectures
    Directive Based Programming
    Heterogeneous Systems
    Parallel Programming Languages and Models
    Performance
    TUT
    DescriptionWith the increasing prevalence of multicore processors, shared-memory programming models are essential. OpenMP is a popular, portable, widely supported, and easy-to-use shared-memory model. Developers usually find OpenMP easy to learn. However, they are often disappointed with the performance and scalability of the resulting code. This disappointment stems not from shortcomings of OpenMP, but rather from the lack of depth with which it is employed. Our “Advanced OpenMP Programming” tutorial addresses this critical need by exploring the implications of possible OpenMP parallelization strategies, both in terms of correctness and performance.

    We assume attendees understand basic parallelization concepts and know the fundamentals of OpenMP. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of vector units. All topics are accompanied by extensive case studies, and we discuss the corresponding language features in-depth. Continuing the emphasis of this successful tutorial series, we focus solely on performance programming for multi-core architectures. Throughout all topics, we present the recent additions of OpenMP 5.0, 5.1 and 5.2 and comment on developments targeting OpenMP 6.0.
    Birds of a Feather
    Advances in FPGA Programming and Technology for HPC
    TP
    XO/EX
    DescriptionFPGAs have gone from niche components to being a central part of many data centers worldwide to being considered for core HPC installations. The last year has seen tremendous advances in FPGA programmability and technology, and FPGAs for general HPC is apparently within reach. This BoF has two parts. The first is a series of lightning talks presenting advances in tools and technologies emphasizing work by new investigators. The second part of the BoF will be a general discussion driven by the interests of the attendees, potentially including additional topics.
    Birds of a Feather
    Advances in Hybrid Quantum-Classical High-Performance Computing
    TP
    XO/EX
    DescriptionThe goal of this BoF session is to bring the HPC and QC communities closer together with the objective to scrutinize HPC codes and workflows for potential hybrid quantum-classical computing.

    The focus will be primarily on the identification of the required tool set, including the infrastructure and of the potential applications, and less on the computation acceleration.

    The format of the BoF will consist of three short impulse talks followed by a moderated panel discussion, inviting substantial contributions from the audience.
    Exhibitor Forum
    Advances in Processor Architecture Driving HPC/AI Convergence for Next-Generation Exascale Systems
    Recorded
    TP
    XO/EX
    DescriptionNext-generation exascale supercomputers are increasingly requiring converged HPC/AI systems, as evidenced by specifications from government, university, and commercial supercomputing labs for future systems, which require AI performance specifications in addition to the traditional specifications for HPC performance.

    HPC and AI workloads are similar, as both compute and memory intensive, as well as being highly parallel. HPC and AI diverge, however, in the level of precision that is often required. The level of data analysis required for HPC applications typically needs double-precision or possibly single-precision. AI, however, frequently requires lower precision, with the reduced precision enabling much higher performance. Another key difference is that AI workloads benefit from sparsity to maximize performance and efficiency, and sparsity is not used by HPC.

    This presentation compares HPC and AI workloads, reviews the trends that are driving AI and HPC convergence for supercomputers, and presents Tachyum’s Prodigy Universal Processor and its revolutionary architecture which unifies the functionality of CPU, GPU, and TPU to address the demands of both HPC and AI workloads in a single device without needing costly and power-hungry accelerators. Key features that will be highlighted include Prodigy’s advanced HPC and AI subsystems, the benefits of lower precision and sparse data types for AI applications, and recent innovations, Tachyum has made to enhance and accelerate AI processing, that are unique to Prodigy.
    Invited Talk
    Advancing HPC with RISC-V
    Recorded
    TP
    XO/EX
    DescriptionRISC-V has grown from a university project into a global open ISA standard with a thriving computing ecosystem comprising hundreds of collaborating organizations, including most major computing companies. This talk will present how RISC-V is well-suited for future HPC computing needs. RISC-V's technical advantages include a greater inherent efficiency than competing architectures, a sophisticated vector processing extension, and natural support for customized instruction set extensions. RISC-V's non-technical advantages include an open standard model that encourages both competition and collaboration, and which ensures long-term stability to protect investment in the software ecosystem.
    Birds of a Feather
    After Covid-19: Building a Public Health Genomics HPC Community for the Future
    TP
    XO/EX
    DescriptionThe Covid-19 pandemic has shone a light on the increasing importance of HPC in public health, particularly with respect to the genomics of key pathogens. This BoF aims to help to provide a starting point to build a new network of those from academic institutions, healthcare organizations, public health agencies, and industry who are responsible for the emerging HPC infrastructures that will be increasingly important in the delivery of Public Health. The BoF will be a forum to share experience and best practice, with the aim of creating a new network of professionals to work together for global benefit.
    Students@SC
    Afternoon Break
    Posters
    Research Posters
    Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming
    TP
    XO/EX
    DescriptionThe LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “CoArray Fortran". We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
    Invited Talk
    AGILE: The Future of Data Centric Computing
    Recorded
    TP
    XO/EX
    DescriptionToday’s era of explosive data growth poses serious challenges for society in transforming massive, random, heterogeneous data streams and structures into useful knowledge, applicable to every aspect of modern life, including national security, economic productivity, scientific discovery, medical breakthroughs, and social interactions. The burgeoning data, which is increasing exponentially not only in volume, but in velocity, variety, and complexity, already far outpaces the abilities of current computing systems to execute the complex data analytics needed to extract meaningful insights in a timely manner.

    The key problem with today’s computers is that they were designed to address yesterday’s compute-intensive problems rather than today’s data-intensive problems. Transforming massive data streams and structures into actionable knowledge and meaningful results in near real-time requires a complete rethinking of computing architectures and technologies – one that places the primary focus on data access and data movement rather than on faster compute power. The data of interest today and in the future is typically sparse, random, and heterogeneous, with minimal locality (it is randomly distributed across the computer), and characterized by poor data re-use, streaming updates flowing into the system, and fine-grain data movement and parallelism. The computations to be performed are determined by the data, and multiple applications might need simultaneous access to the same data. These are very different conditions than those characteristic of yesterday’s compute-intensive applications.

    IARPA’s new AGILE Program aims to provide data-analytic results in time for appropriate response, e.g., to predict impending adversarial events rather than forensically analyzing them after the fact. It will accomplish this goal by developing new system-level intelligent mechanisms for moving, accessing, and storing large, random, time-varying data streams and structures that allow for the scalable and efficient execution of dynamic graph analytic applications. The program solicited system designs that emphasize optimizing the fully integrated system, not independent optimization of individual functionalities. AGILE aims to develop scalable, energy-efficient computing system designs that enable solutions to data-intensive problems as well as traditional compute-intensive problems. These designs will be cost-effective and realizable in silicon prior to the year 2030.
    Workshop
    AI for Generating Real-World Evidence in Cancer
    Recorded
    W
    Paper
    AI for Quantum Mechanics: High Performance Quantum Many-Body Simulations via Deep Learning
    Recorded
    Machine Learning and Artificial Intelligence
    TP
    DescriptionSolving quantum many-body problems is one of the most fascinating research fields in condensed matter physics. An efficient numerical method is crucial to understand the mechanism of novel physics, such as the high Tc superconductivity, as one has to find the optimal solution in the exponentially large Hilbert space. The development of Artificial Intelligence (AI) provides a unique opportunity to solve the quantum many-body problems, but there is still a large gap from the goal. In this work, we present a novel computational framework and adapt it to the Sunway supercomputer. With highly efficient scalability up to 40 million heterogeneous cores, we can drastically increase the number of variational parameters, which greatly improves the accuracy of the solutions. The investigations of the spin-1/2 J1-J2 model and the t-J model achieve unprecedented accuracy and time-to-solution far beyond the previous state of the art.
    Birds of a Feather
    AI Is Not Neutral! Ethical Concerns of Coupling AI with HPC
    TP
    XO/EX
    DescriptionHPC is increasingly employed in AI. Although HPC itself is natively ethically neutral, its use to enable AI applications that can have harmful impacts on humans and society and can render HPC collusive and ethically liable. This BoF will consider the ethical implication of the coupling of AI and HPC and the formation of guidelines for the HPC community to ensure that researchers consider potentially harmful consequences of their research and adhere to best practices for sustainable and ethical use of HPC resources.
    Job Posting
    AI Solutions Architect
    DescriptionProvide advanced and innovative data and technology leadership and support for a research unit. Hold the role of lead subject matter expert regarding delivery of IT and data services in support of research. Define and implement a sustainable and secure data management strategy which meets the needs of researchers and sponsors. Collaborate with professional peers on campus and nationally. This position will interact on a consistent basis with: Faculty, staff, students, postdocs, and unit management. Some supervision is possible (e.g., students or junior staff). Hybrid remote work options are available.

    Responsibilities
    Provide high-level expertise, advice, and technology leadership; defines and implements the vision and strategy for data management for an AI Institute.
    Interact and collaborate with Institute management, multiple research groups across partner organizations, and other stakeholders (e.g., local and external HPC providers).
    Define, evaluate, and implement technical IT and data systems, architecture, applications, and services to serve the AI Institute research mission.
    Coordinate with other data management efforts on campus to plan for and analyze technology investments; identify common needs among research groups and develop associated solutions, focusing on the collaborative use of resources where possible.
    Perform other duties as assigned.
    Workshop
    AI4S – Afternoon Break
    Recorded
    W
    Workshop
    AI4S Featured Speaker
    Recorded
    W
    Workshop
    AI4S Panel
    Recorded
    W
    Job Posting
    Algo Internship
    DescriptionWith over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA’s metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. The Surfscan group includes a team of engineers, technology development, apps engineers and product marketing focused on technology that enables wafer, IC and equipment manufacturers to develop, qualify and monitor their process tools. Defects and process non-uniformities detected on Surfscan equipment allow for early identification of yield excursions. The flagship Surfscan products include the SPx platforms for wafer surface quality and wafer defect inspection tools and systems for inspection of polished wafers, epi wafers and engineered substrates during the wafer fabrication process.

    Job Description/Preferred Qualifications

    The job focuses on the development of image, signal processing and artificial intelligence algorithms for the next generations of optical inspection and metrology systems.

    The position requires a proven innovative track record and solid fundamental knowledge in the related fields of algorithm development, including image segmentation, texture analysis, classification, feature extraction, statistical data analysis, signal processing, filter theory, machine learning, deep learning.

    C/C++, Matlab/Python programing skills are must have. CPU optimization (SSE/AVX) and GPU (CUDA) programming are among the highly desired skills too.

    The responsibilities of this position cover the entire life cycle of algorithms, including modeling, proof-of-concept design, production software design and implementation, performance characterization, documentation and user support. Since algorithms can affect many aspects of the system, significant amount of time be spent on cross-function team collaboration for prototyping and testing.

    The candidate needs to be a self-motivated individual with ability to work independently and/or in a team. Strong written and verbal communications skills are needed for extensive interactions with members of a multi-disciplinary global team.
    Job Posting
    Algorithm Engineering Intern
    DescriptionResponsibilities:

    An intern with the AI and Modeling Center of excellence will work in one or more of the following areas. Interns will be technically supported and mentored throughout their stay with KLA.

    Work with traditional machine learning and deep learning techniques to meet and improve results on KLA products.
    Experiment with new and novel techniques to improve results or reduce compute cost of various modeling techniques.
    Build tools for more efficient experimentation.
    Manage data used for training and experimentation of AI and physics modeling systems.
    Image processing.
    Speeding up physics models.
    Developing software tools and solutions for KLA products.
    Job Posting
    Algorithm Engineering Intern (FaST)
    DescriptionWith over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA’s metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. The Film and Scatterometry Technology (FaST) Division provides industry leading metrology solutions for worldwide semiconductor IC manufacturers. The FaST Division portfolio of metrology products includes hardware and software solutions for optical film thickness, optical critical dimension (CD), composition, and resistivity measurement systems. These products are essential for the IC manufacturers as they provide critical metrology capabilities for the development and implementation of their advanced IC processes. The FaST division is committed to support our customers to achieve performance entitlement of our solution and we effectively partner with our customers from their early research and development phase to the high volume in-line manufacturing implementation specific for their process needs. The division consists of a global team located in US, Israel, China, and India.

    Responsibilities

    In this role, you will be a part of the SCD Algorithm group for the FaST division at KLA. We are looking for an algorithm engineering intern to work in one or more of the following areas:

    Develop, implement, and improve electromagnetic algorithms
    Research and develop physics-directed ML algorithms
    Research and develop specialized optimization algorithms
    Job Posting
    Algorithm Engineering Intern (RAPID)
    DescriptionWith over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA’s metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. The RAPID division is the world leading provider of reticle inspection solutions for the semiconductor industry. The company provides inspection solutions to both the mask shops and the semiconductor fabs to ensure that lithography yields are consistently high thus enabling cost-effective manufacturing.

    Responsibilities:

    KLA is seeking a motivated individual for an engineering intern position in a world-class algorithm group within the reticle product division (RAPID). Our intern will work in one or more of the following areas

    Computational geometry
    Image processing
    Work with traditional machine learning and deep learning techniques to meet and improve results on KLA products
    Develop software tools and solutions for KLA products
    Job Posting
    Algorithm/Architecture Research Scientist
    DescriptionLawrence Berkeley National Lab’s (LBNL, https://www.lbl.gov/) Applied Mathematics and Computational Research Division (https://crd.lbl.gov/divisions/amcr/applied-mathematics-dept/) has an opening for an Algorithm/Architecture Research Scientist to join the team.

    In this exciting role, you will lead teams to conduct algorithmic co-design for future HPC architectures and quantify the performance of such systems through modeling, simulation, and numerical analysis. The scientist’s research and expertise must include parallel performance analysis, modeling and simulation of computer architectures, analysis of numerical algorithms, and parallel simulation methodologies such as parallel discrete event simulation.
    Paper
    AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices
    Recorded
    Accelerator-based Architectures
    Performance
    Visualization
    TP
    DescriptionSparse Matrix-Vector multiplication (SpMV) is an important computational kernel. Tens of sparse matrix formats and implementations have been designed to speed up SpMV performance. We develop AlphaSparse. It goes beyond the scopes of human-designed artificial formats and traditional auto-tuners subject to prior existing artificial format(s) and implementation(s), by automatically creating new machine-designed formats and SpMV kernel implementations entirely from the knowledge of input sparsity patterns and hardware architectures. Based on our proposed Operator Graph that expresses the path of SpMV code design, it takes an arbitrary sparse matrix as input while outputting the machine-designed format and SpMV implementation that achieve high performance. By extensively evaluating 843 matrices from SuiteSparse Matrix Collection, AlphaSparse achieves performance improvement by up to 22.2 times (3.2 times on average) compared to state-of-the-art five artificial formats and up to 2.8 times (1.5 times on average) over the up-to-date implementation of traditional auto-tuning.
    Workshop
    AMD HACC Program – Overview and Infrastructure at PC2
    Recorded
    W
    DescriptionThe AMD Heterogeneous Accelerated Computing Program (HACC) is an initiative by AMD to provide an infrastructure and exchange platform for studying FPGA acceleration for HPC and data center workloads. The Paderborn Center for Parallel Computing (PC2) was accepted into the HACC initiative in spring 2022, which now comprises five centers worldwide. I will give a brief overview of the HACC program and will highlight the new Alveo U280 partition of our Noctua 2 supercomputer, which is accessible through the HACC program, and provides a particularly flexible software and networking environment.
    Birds of a Feather
    Americas HPC Collaboration
    TP
    XO/EX
    DescriptionThe 2022 edition of the Americas High-Performance Computing (HPC) Collaboration BoF seeks to showcase collaborations that have resulted from the partnerships formed in previous editions. It will also present opportunities and experiences between different HPC Networks and Laboratories from countries in North, Central, South America, and the Caribbean. This BoF aims at showing the current state of the art in continental collaboration in HPC research, the latest developments of regional collaborative networks, and updating the roadmap for the next year for the Americas HPC partnerships.
    Posters
    Research Posters
    An Approach for Large-Scale Distributed FFT Framework on GPUs
    TP
    XO/EX
    DescriptionThe fast Fourier Transforms (FFT), a reduced-complexity formulation of the Discrete Fourier Transform (DFT), dominate the computational cost in many areas of science and engineering. Due to the large-scale data, multi-node heterogeneous systems aspire to meet the increasing demands from parallel computing FFT in the field of High-Performance Computing (HPC). In this work, we present a highly efficient GPU-based distributed FFT framework by adapting the Cooley-Tukey recursive FFT algorithm. Two major types of optimizations, including automatic low-dimensional FFT kernel generation and asynchronous strategy for multi-GPUs, are presented to enhance the performance of our approach for large-scale distributed FFT, and numerical experiments demonstrate that our work achieves more than 40x speedup over CPU FFT libraries and about 2x speedup over heFFTe, currently available state-of-art research, on GPUs.
    Workshop
    An Artificial Intelligence Bootcamp for Cyberinfrastructure Professionals
    Recorded
    W
    DescriptionOur team is a developing a series of AI Bootcamps for Cyberinfrastructure (CI) Professionals to increase support expertise for researchers with Artificial Intelligence (AI) workloads running at research computing facilities. We have completed the first six-week, virtual program of core foundations topics in AI and machine learning. Our next bootcamp is focused on CI professionals in software and data engineering roles. Our team is comprised of CI professionals and Computer Science and Engineering faculty to provide a comprehensive curriculum for the professional learner. We saw a great deal of enthusiasm among the CI professional community for this program and those who attended rated it highly. We plan to refine the materials and make them generally available at the end of the project.
    Workshop
    An Automated Approach to Continuous Acceptance Testing of HPC Systems at NERSC
    Recorded
    Benchmarking
    Cloud and Distributed Computing
    Containers
    Datacenter
    Networks
    Privacy
    Resource Management and Scheduling
    Security
    SIGHPC
    State of the Practice
    System Administration
    System Software
    W
    DescriptionWe demonstrate a continuous acceptance testing strategy used at NERSC that can be implemented in the broader HPC community. To accomplish this task, we designed a new framework that can handle the complex parts of HPC systems, allowing us to verify a system is working optimally. buildtest [1] is an acceptance testing framework that can automate the testing of HPC systems and enable HPC support teams to painlessly create and run tests. Testing is initiated by changes to the system/software stack at scheduled system outage that demands for NERSC staff to build, run and monitor test results using GitLab’s Continuous Integration (CI) [2]. Test results are clearly communicated to developers and users via the CDash [3] web interface and test failures are documented as github issues. Together this framework forms a robust method for verifying cutting edge software stacks’ function in challenging HPC environments.
    Workshop
    An Automated Cryo-EM Computational Environment on the HPC System Using Pegasus WMS
    Recorded
    Cloud and Distributed Computing
    In Situ Processing
    Scientific Computing
    Workflows
    W
    DescriptionCryogenic electron microscopy (Cryo-EM) is a method applied to samples cooled to cryogenic temperatures that can reach a near-atomic resolution of biological molecules. Recent progress in methodology has created an entirely new set of challenges to overcome - among them, the specific environment of the HPC system and coordination and automation of the initial stages. Our solution is an automated Cryo-EM image pre-processing service tailored to an HPC environment with close to real-time feedback allowing the researchers to interact with the data acquisition session located in a facility remote to the HPC cluster. We automated the data transfer, created a service around the Pegasus Workflow Management System, kept the user interaction minimum, and offered the researcher an option to start the pre-processing right after initiating the microscope session. The users receive real-time feedback enabling them to interact with the data acquisition, adjust it and collect a better dataset.
    Workshop
    An Educational and Training Perspective on Integrating Hybrid Technologies with HPC Systems for Solving Real-World Commercial Problems.
    Recorded
    HPC Training and Education
    W
    DescriptionDelivering training and education on hybrid technologies (including AI, ML, GPU, Data and Visual Analytics including VR and Quantum Computing) integrated with HPC resources is key to enable individuals and businesses to take full advantage of digital technologies, hence enhancing processes within organizations and providing the enabling skills to thrive in a digital economy. Supercomputing centers focused on solving industry-led problems face the challenge of having a pool of users with little experience in executing simulations on large scale facilities, as well as limited knowledge of advanced computational techniques and integrated technologies. We aim not only at educating them in using the facilities available, but to raise awareness of methods which have the potential to increase their productivity. In this presentation, we provide our perspective on how to efficiently train industry users, and how to engage about wider digital technologies and how these, used efficiently together, can benefit their business.
    Workshop
    An Initial Evaluation of Arm's Scalable Matrix Extension
    Recorded
    Applications
    Architectures
    Benchmarking
    Exascale Computing
    Modeling and Simulation
    Performance
    Performance Portability
    W
    DescriptionExpanding upon their Scalable Vector Extension (SVE), Arm have introduced the Scalable Matrix Extension (SME) to improve in-core performance for matrix operations such as matrix multiplication. With the lack of hardware and cycle-accurate simulations available which supports SME, it is unclear how effective this new instruction set extension will be, and for what type of applications it will provide the most benefit.

    By adapting The Simulation Engine (SimEng) from the University of Bristol’s High Performance Computing Group to support SME, we aim to compare the simulated performance of a Fujitsu A64FX core (with native SVE support) to a like-for- like hypothetical core with added SME support. By simulating a wide range of Streaming Vector Lengths for our hypothetical SME core model, we provide and discuss first-of-a-kind results for an SME implementation, before discussing future work that will be carried out to further evaluate the suitability of SME.
    Posters
    Research Posters
    Analysis and Visualization of Important Performance Counters to Enhance Interpretability of Autotuner Output
    TP
    XO/EX
    DescriptionAutotuning is a widely used method for guiding developers of large-scale applications to achieve high performance. However, autotuners typically employ black-box optimizations to recommend parameter settings at the cost of users missing the opportunity to identify performance bottlenecks. Performance analysis fills that gap and identifies problems and optimization opportunities that can result in better runtime and utilization of hardware resources. This work combines the best of the both worlds by integrating a systematic performance analysis and visualization approach into a publicly available autotuning framework, GPTune, to suggest users which configuration parameters are important to tune, to what value, and how tuning the parameters affect hardware-application interactions. Our experiments demonstrate that a subset of the task parameters impact the execution time of the Hypre application; the memory traffic and page faults cause performance problems in the Plasma-DGEMM routine on Cori-Haswell.
    Workshop
    Analysis of User-Support Tickets in the Lifetime of the Blue Waters System
    Recorded
    W
    DescriptionWe present an analysis of the collection of user-support tickets that were created during nearly nine years of operation of the Blue Waters supercomputer. The analysis was based on information obtained from the Jira ticketing system and its corresponding queues. The paper contains a set of statistics showing, in quantitative form, the distribution of tickets across system areas. It also shows the computed metrics related to management of the tickets by our staff. Additionally, we present an analysis, based on Machine-Learning and Sentiment Analysis techniques, conducted over the text entered in tickets, aimed at detecting trends on users' views and perspectives about Blue Waters. This kind of study, which is uncommon in the literature, could provide guidance for operators of future large systems about the expected volume of user support demanded by each system area, and about how to allocate support staff such that users receive the best possible assistance.
    ACM Student Research Competition: Graduate Poster
    ACM Student Research Competition: Undergraduate Poster
    Posters
    Analysis of Validating and Verifying OpenACC Compilers 3.0 and Above
    Recorded
    TP
    DescriptionOpenACC is a high-level directive-based parallel programming model that can manage the sophistication of heterogeneity in architectures and abstract it from the users. The portability of the model across CPUs and accelerators has gained the model a wide variety of users. This means it is also crucial to analyze the reliability of the compilers' implementations. To address this challenge, the OpenACC Validation and Verification team has proposed a validation testsuite to verify the OpenACC implementations across various compilers with an infrastructure for a more streamlined execution. This paper will cover the following aspects: (a) the new developments since the last publication on the tetsuite, (b) outline the use of the infrastructure, (c) discuss tests that highlight our workflow process, (d) analyze the results from executing the testsuite on various systems, and (e) outline future developments.
    Workshop
    Analysis of Validating and Verifying OpenACC Compilers 3.0 and Above
    Recorded
    Accelerator-based Architectures
    Compilers
    Dataflow and Tasking
    Directive Based Programming
    Heterogeneous Systems
    Parallel Programming Languages and Models
    Runtime Systems
    W
    DescriptionOpenACC is a high-level directive-based parallel programming model that can manage the sophistication of heterogeneity in architectures and abstract it from the users. The portability of the model across CPUs and accelerators has gained the model a wide variety of users. This means it is also crucial to analyze the reliability of the compilers' implementations. To address this challenge, the OpenACC Validation and Verification team has proposed a validation testsuite to verify the OpenACC implementations across various compilers with an infrastructure for a more streamlined execution. This paper will cover the following aspects: (a) the new developments since the last publication on the testsuite, (b) outline the use of the infrastructure, (c) discuss tests that highlight our workflow process, (d) analyze the results from executing the testsuite on various systems, and (e) outline future developments.
    Posters
    Research Posters
    Analyzing NOvA Neutrino Data with the Perlmutter Supercomputer
    TP
    XO/EX
    DescriptionNOvA is a world-leading neutrino physics experiment that is making measurements of fundamental neutrino physics parameters and performing searches for physics beyond the Standard Model. These measurements must leverage high performance computing facilities to perform data intensive computations and execute complex statistical analyses. We outline the NOvA analysis workflows we have implemented on NERSC Cori and Perlmutter systems. We have developed an implicitly-parallel data-filtering framework for high energy physics data based on pandas and HDF5. We demonstrate scalability of the framework and advantages of an aggregated monolithic dataset by using a realistic neutrino cross-section measurement. We also demonstrate the performance and scalability of the computationally intensive profiled Feldman-Cousins procedure for statistical analysis. This process performs statistical confidence interval construction based on non-parametric Monte Carlo simulation and was applied to the NOvA sterile neutrino search. We show the NERSC Perlmutter system provides an order of magnitude computing performance gain over Cori.
    Birds of a Feather
    Analyzing Parallel I/O
    TP
    XO/EX
    DescriptionParallel I/O performance can be a critical bottleneck for applications, yet users are often ill-equipped for identifying and diagnosing I/O performance issues. Increasingly complex hierarchies of storage hardware and software deployed on many systems only compound this problem. Tools that can effectively capture, analyze, and tune I/O behavior for these systems empower users to realize performance gains for many applications.

    In this BoF, we form a community around best practices in analyzing parallel I/O and cover recent advances to help address the problem presented above, drawing on the expertise of users, I/O researchers, and administrators in attendance.
    Workshop
    Analyzing the Energy Consumption of Synchronous and Asynchronous Checkpointing Strategies
    Recorded
    Reliability and Resiliency
    W
    DescriptionWith exascale computing, the number of components that comprise high-performance computing (HPC) systems has increased by more than 70%, leading to a shorter mean time between failure (MTBF) and larger power budgets. These issues induce the need for (1) checkpoint/restart (C/R) and (2) energy reduction techniques. C/R has evolved with different software and hardware advances, thus it is crucial to understand how its energy usage differs under various storage tiers and synchronicity. In this paper, we present a comparison of the energy consumption of leading, state-of-the-art C/R libraries, VELOC and GenericIO. We perform weak and strong scalability tests of the C/R libraries and show that asynchronous C/R provides 4x greater throughput while using 33% less energy than synchronous C/R. Data size and throughput are directly correlated to energy consumption. Therefore, C/R developers should focus on ways to improve/maintain high throughput in order to reduce energy consumption to address exascale needs.
    Workshop
    Analyzing the Impact of Lossy Data Reduction on Volume Rendering of Cosmology Data
    Recorded
    W
    DescriptionCosmology simulations are among some of the largest simulations being currently run on supercomputers, generating terabytes to petabytes of data for each run. Consequently, scientists are seeking to reduce the amount of storage needed while preserving enough quality for analysis and visualization of the data. One of the most commonly used visualization techniques for cosmology simulations is volume rendering. Here, we investigate how different types of lossy error-bound compression algorithms affect the quality of volume-rendered images generated from reconstructed datasets. We also compute a number of image quality assessment metrics to determine which ones are the most effective at identifying artifacts in the visualizations.
    Birds of a Feather
    Another Step Toward a Sustainable HPC Outreach Ecosystem
    TP
    XO/EX
    DescriptionThe ultimate goal of outreach activities is to connect with individuals outside or at the periphery of the HPC community and empower them to become the next generation of HPC professionals. While most large centers and organizations have some outreach staff, many small HPC centers find the development and maintenance of an outreach program a serious challenge. This BoF session will gather HPC Outreach facilitators from across the community to share challenges, experiences, lessons learned and strategies for developing sustainable Outreach programs. The discussions will be captured into a shared document that will guide future community efforts.
    Birds of a Feather
    Anyscale with RISC-V: Powering the Next Generation of (IoT to) HPC Systems
    TP
    XO/EX
    DescriptionThe goal of this BoF is to introduce the HPC community to the RISC-V ecosystem and how it can enable research and development. We will start with a short panel presentation (20 minutes) on the status of the RISC-V HPC ecosystem. This will be followed by a Q&A session with the panel and audience members. There will be directed questions as well as ad hoc questions from the audience.
    Workshop
    AppEKG: A Simple Unifying View of HPC Applications in Production
    Recorded
    Applications
    Architectures
    Benchmarking
    Exascale Computing
    Modeling and Simulation
    Performance
    Performance Portability
    W
    DescriptionWhile many good development-oriented tools exist for analyzing and improving the performance of HPC applications, capability for capturing and analyzing the dynamic behavior of application in real production runs is lacking. Many heavily-used applications do keep some internal metrics of their performance, but there is no unified way of using these. In this paper we present the initial idea of AppEKG, both a concept of and a prototype tool for providing a unified, understandable view of HPC application behavior in production. Our prototype AppEKG framework can achieve less than 1% overhead, thus usable in production, and still provide dynamic data collection that captures time-varying runtime behavior.
    Workshop
    Application of Privacy Preserving Federated Learning in Biomedical Applications – Lessons Learned from the PALISADE-X project
    Recorded
    W
    Workshop
    Approaching Exascale: Best Practices for Training a Diverse Workforce Using Hackathons
    Recorded
    HPC Training and Education
    W
    DescriptionGiven the anticipated growth of the high-performance computing market, HPC is challenged with expanding the size, diversity, and skill of its workforce while also addressing post-pandemic distributed workforce protocols and an ever-expanding ecosystem of architectures, accelerators and software stacks.

    As we move toward exascale computing, training approaches need to address how best to prepare future computational scientists and enable established domain researchers to stay current and master tools needed for exascale architectures.

    This paper explores adding in-person and virtual hackathons to the training mix to bridge traditional programming curricula and hands-on skills needed among the diverse communities. We outline current learning and development programs available; explain benefits and challenges in implementing hackathons for training; share specific use cases, including training “readiness,” outcomes and sustaining progress; discuss how to engage diverse communities—from early career researchers to veteran scientists; and recommend best practices for implementing these events into their training mix.
    Paper
    Approximate Computing Through the Lens of Uncertainty Quantification
    Recorded
    Post-Moore Computing
    Quantum Computing
    TP
    DescriptionAs computer system technology approaches the end of Moore's law, new computing paradigms that improve performance become a necessity. One such paradigm is approximate computing (AC). AC can present significant performance improvements, but a challenge lies in providing confidence that approximations will not overly degrade the application output quality. In AC, application domain experts manually identify code regions amenable to approximation. However, automatically guiding a developer where to apply AC is still a challenge.

    We propose Puppeteer, a novel method to rank code regions based on amenability to approximation. Puppeteer uses uncertainty quantification methods to measure the sensitivity of application outputs to approximation errors. A developer annotates possible application code regions and Puppeteer estimates the sensitivity of each region. Puppeteer successfully identifies insensitive regions on different benchmarks. We utilize AC on these regions and we obtain speedups of 1.18x, 1.8x, and 1.3x for HPCCG, DCT, and BlackScholes, respectively.
    Workshop
    Argonne Site Report
    Recorded
    Architectures
    Benchmarking
    Cloud and Distributed Computing
    Containers
    Datacenter
    Networks
    Privacy
    Resource Management and Scheduling
    Security
    SIGHPC
    State of the Practice
    System Administration
    System Software
    W
    DescriptionUpdate on the Status of Argonne's New and Expected Systems.
    Birds of a Feather
    Arm Diversity Unified: Standardization in Hardware and Software
    TP
    XO/EX
    DescriptionThis BoF brings together the Arm HPC community to discuss how current and future standards will influence the growing diversity of Arm-related hardware and software. A panel composed of government, academic, and industry practitioners and vendors will discuss whether hardware standards (e.g., Armv9 and SBSA) and software standards (e.g., C++ Standard Parallelism and OpenMP) can sufficiently support the growing and diverse Arm hardware ecosystem. Audience participation is strongly encouraged with a focus on answering standards-related questions and facilitating the growth and interoperability of future Arm-based extreme scale systems.
    Posters
    Research Posters
    Artificial Intelligence Reconstructs Missing Climate Information
    TP
    XO/EX
    DescriptionHistorical temperature measurements are the basis of important global climate datasets like HadCRUT4 and HadCRUT5 to analyze climate change. These datasets contain many missing values and have low resolution grids. Here we demonstrate that artificial intelligence can skillfully fill these observational gaps and upscale these when combined with numerical climate model data. We show that recently developed image inpainting techniques perform accurate reconstructions via transfer learning. In addition, high resolution in weather and climate was always a common and ongoing goal of the community. We gain a neural network which reconstructs and downscales the important observational data sets (IPCC AR6) at the same time, which is unique and state-of-the-art in climate research.
    Birds of a Feather
    ASEAN HPC
    TP
    XO/EX
    DescriptionWith the rise of ASEAN significance in the global landscape, so has its HPC. There are multiple world-class supercomputers now being planned and deployed and rising sets of users conducting cutting edge sciences. ASEAN has officially sanctioned its “HPC Task Force” among its coalition of major stakeholders to formulate a collective HPC infrastructure, federate them with advanced tools, collaborate with other regions e.g., Japan with Fugaku as well as with a joint HPC school with Europe and Japan. The BoF will present the status quo of ASEAN HPC and discuss further outreach of ASEAN HPC to the global HPC community.
    Workshop
    Assessing the Current State of AWS Spot Market Forecastability
    Recorded
    W
    DescriptionSince 2009, Amazon has offered its unused compute capacity as AWS Spot Instances. For the first eight years of spot, pure market dynamics and high pricing variability created an ideal environment for time-series prediction. Following a pricing-scheme change in 2017, this extreme variability was removed as pricing is artificially smoothed for the end-user, therefore making it significantly easier to accurately predict prices. Nevertheless, the literature demonstrates ongoing efforts to accurately predict spot prices. To show prediction in the modern spot market is unnecessary, we train nearly 2.2 million ARIMA models on new and old data to demonstrate an order of magnitude improvement in accuracy for models trained on new data. Further, we show this new ease of price prediction makes spot instances ideal for large-scale, cost-aware cloud computing, as cost estimation is now trivial. Accordingly, we demonstrate that even naive prediction approaches waste less than $360 for 1,000,000 core hours.
    Workshop
    Assessing the Memory Wall in Complex Codes
    Recorded
    W
    DescriptionMany of Los Alamos National Laboratory's HPC codes are memory bandwidth bound. These codes exhibit high levels of sparse memory
    access which differ significantly from standard benchmarks.
    In this paper we present an analysis of the memory access of some of our most important code-bases. We then generate micro-benchmarks
    that preserve the memory access characteristics of our codes using two approaches,
    one based on statistical sampling of relative memory offsets in a sliding time window at the
    function level and another at the loop level. The function level approach is used to
    assess the impact of advanced memory technologies such as LPDDR5 and HBM3 using
    the gem5 simulator. Our simulation results show significant improvements for sparse memory access workloads using HBM3 relative to LPDDR5 and better scaling on a per core basis. Assessment of two different architectures show that higher peak memory bandwidth results in high bandwidth on sparse workloads.
    Job Posting
    Assistant Director of Analytics and Data Science
    DescriptionRenaissance Computing Institute (RENCI) is seeking an Assistant Director of Analytics and Data Science to develop an independent research program in Data Science including artificial intelligence, machine learning, knowledge graphs and other analytical methods. The individual is expected to support and manage biomedical and environmental research projects, aid in management of the Analytics and Data Science team, and mentor, guide and evaluate several direct reports.

    The Assistant Director will provide technical leadership, setting the analytical direction of research projects. The Assistant Director will apply their Data Science expertise to help advance RENCI’s research portfolio in Data Science, leading the development and execution of new research projects and proposals, both independently and in collaboration with internal and external partners.
    The Assistant Director will develop algorithms and tools involving, but not limited to: image analysis, natural language processing, graph analysis, question answering, and semantic search. Prior experience in one or more of these areas is essential.

    The Assistant Director will work with colleagues with expertise in data analytics, advanced computing including cloud computing systems, software engineering and domain expertise in the biomedical and environmental sciences. The Researcher will work in interdisciplinary teams, both within and beyond RENCI, promoting innovation and collaboration.

    Responsibilities:

    - Develop independent research portfolio in area of expertise
    - Manage research projects, and provide analytical direction
    - Work with colleagues to develop and apply tools and methods
    - Collaborate with software engineers and researchers to help design analytical systems to advance scientific discovery
    - Provide leadership in user engagement and experience
    - Contribute to interdisciplinary teams
    - Train data contributors on the application of data standards and relevant tools
    - Develop proposals and business development efforts
    Job Posting
    Associate Director of Integrated Cyberinfrastructure
    DescriptionJob Summary
    The Associate Director of the Integrated Cyberinfrastructure (ICI) Directorate is a senior member of the NCSA Director's Office that works as part of a team on both setting NCSA strategy and executing on tactical directions. This position is actively engaged with leaders across campus with initiative development and follow-through as well as with other academic institutions and industry leaders. This position provides experienced management of technology development in support of advanced applications and communities across the disciplines, with the goal of delivering functional advanced technologies to NCSA's academic and industrial users.

    Duties & Responsibilities
    Strategic Direction and Leadership (50%)
    • Set strategic direction for and provide leadership of NCSA’s Integrated Cyberinfrastructure (ICI) Directorate and the groups within it. This includes developing strategic plans for ICI and implementing policies and procedures to bring the strategic plans to fruition.
    • Direct the ICI Directorate budgets and Memorandums of Understanding (MOUs) with regards to the NCSA / ICI mission and vision.
    • Create a diverse and inclusive working environment for ICI staff that fosters collaboration, operational excellence, and innovation.
    • Provide growth opportunities for ICI staff by creating transparent career paths and promoting professional development opportunities.
    • Evaluate internal operating policies and procedures relating to ICI Directorate. Determine what changes / improvements should be made and implement said changes / improvements.
    • Supervise ICI Division managers, including establishing strategic initiatives, assigning project tasks, staffing, setting goals and evaluating performance. The Associate Director will also work to empower managers and employees, define the broader context of their work, and explain how the team’s work contributes not only to the success of the ICI Directorate, but to NCSA as a whole.
    • Participate in discussions with the NCSA Executive Committee and the Director on strategic issues to best position the Center for accomplishing its mission.
    • Coordinate with Senior Associate Directors, Associate Directors, and team leaders to coordinate and leverage staff and expertise which may be useful between multiple NCSA divisions and groups.
    • Direct and support staff in the proper implementation of University policy and procedures.


    Engagement and Outreach (25%)
    • Represent NCSA at key national and international meetings and in the community of cyberinfrastructure developers and relevant standards bodies.
    • Represent NCSA in interactions with existing and prospective collaborators.
    • Serve as an NCSA representative on advanced technology in support of science and engineering.
    • Advance technology collaborations between ICI staff and UIUC faculty and students.
    • Assist in supporting the business technology needs of NCSA.
    • Provide input to national funding agencies for the creation of opportunities appropriate to advance efforts in support of NCSA’s mission and vision.


    Research and Proposal Development (25%)
    • Identify, develop, assess, and pursue funding opportunities that will advance NCSA’s mission, vision and strategic goals.
    • Lead proposals and assist ICI Division managers on proposals to funding agencies, such as the National Science Foundation, Department of Energy, National Institutes of Health, etc. to develop advanced cyberinfrastructure.
    • Lead ICI managers and staff in the development, deployment, and support of an advanced HPC/Data research environment.
    Workshop
    Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library
    Recorded
    Applications
    Architectures
    Heterogeneous Systems
    Hierarchical Parallelism
    Parallel Programming Languages and Models
    Performance
    Performance Portability
    Scientific Computing
    W
    DescriptionWith dynamic imbalances caused by both software and ever more complex hardware, applications and runtime systems must adapt to dynamic load imbalances. We present a diffusion-based, reactive, fully asynchronous, and decentralized dynamic load balancer for a distributed actor library. With the asynchronous execution model, features such as remote procedure calls, and support for serialization of arbitrary types, UPC++ is especially feasible for the implementation of the actor model. While providing a substantial speedup for small- to medium-sized jobs with both predictable and unpredictable workload imbalances, the scalability of the diffusion-based approaches remains below expectations in most presented test cases.
    Exhibitor Forum
    Aurora Vector Annealing to Solve Social Issues and Acceleration by NEC’s Supercomputer, SX-Aurora TSUBASA
    Recorded
    TP
    XO/EX
    DescriptionThis presentation consists of two parts, discussing SX-Aurora TSUBASA vector supercomputer and introducing digital annealer working on SX-Aurora TSUBASA called Vector Annealer. The first half of the presentation shows the vector architecture of SX-Aurora TSUBASA, especially its latest vector processors having the highest-level memory bandwidth. Sustained performance and power efficiency are also discussed, as well as NEC’s future plans and roadmap. The second half of the presentation shows NEC’s quantum computing strategies and their products to provide higher sustained performance in the annealing/optimization fields. NEC developed Vector Annealing as a digital annealer and has a strong business relationship with D-Wave providing a quantum annealer. NEC aims at solving various social issues by using the quantum/digital annealing technologies and by developing a hybrid platform with supercomputer and quantum/digital annealer to provide much higher sustained performance,
    Workshop
    Automated Continual Learning of Defect Identification in Coherent Diffraction Imaging
    Recorded
    W
    DescriptionX-ray Bragg coherent diffraction imaging (BCDI) is widely used for materials characterization. However, obtaining X-ray diffraction data is difficult and computationally intensive. Here, we introduce a machine learning approach to identify crystalline line defects in samples from the raw coherent diffraction data. To automate this process, we compose a workflow coupling coherent diffraction data generation with training and inference of deep neural network defect classifiers. In particular, we adopt a continual learning approach, where we generate training and inference data as needed based on the accuracy of the defect classifier instead of all training data generated a priori. The results show that our approach improves the accuracy of defect classifiers while using much fewer samples of data.
    Workshop
    Automated Error Mitigation Based on Probabilistic Error Reduction
    Recorded
    Quantum Computing
    W
    DescriptionCurrent quantum computers suffer from noise that prohibits extracting useful results directly from longer computations. The figure of merit is often an expectation value, which experiences a noise induced bias. A systematic way to remove such bias is probabilistic error cancellation (PEC). PEC requires noise characterization and introduces an exponential sampling overhead.

    Probabilistic error reduction (PER) is a related method that systematically reduces the overhead. In combination with zero-noise extrapolation, PER can yield expectation values with an accuracy comparable to PEC. We present an automated quantum error mitigation software framework that includes noise tomography and application of PER to user-specified circuits. We provide a multi-platform Python package that implements a recently developed Pauli noise tomography technique and exploits a noise scaling method to carry out PER. We also provide software that leverages a previously developed toolchain, employing PyGSTi for gate set tomography and Mitiq for PER.
    Workshop
    Automated Quantum Memory Compilation with Improved Dynamic Range
    Recorded
    Quantum Computing
    W
    DescriptionEmerging quantum algorithms that process data require that classical input data be represented as a quantum state. These data-processing algorithms often follow the gate model of quantum computing---which requires qubits to be initialized to a basis state, typically |0> ---and thus often employ state generation circuits to transform the initialized basis state to a data-representation state. There are many ways to encode classical data in a qubit, and the oft-applied approach of basis encoding does not allow optimization to the extent that other variants do. In this work, we thus consider automatic synthesis of addressable, quantum read-only memory (QROM) circuits, which act as data-encoding state-generation circuits. We investigate three data encoding approaches, one of which we introduce to provide improved dynamic range and precision. We present experimental results that compare these encoding methods for QROM synthesis to better understand the implications of and applications for each.
    Workshop
    Automatic Asynchronous Execution of Synchronously Offloaded OpenMP Target Regions
    Recorded
    W
    DescriptionUse of heterogeneous architectures has steadily increased during the past decade. However, non-homogeneous systems present a challenge to the programming model as the execution models between CPU and accelerator might differ considerably. OpenMP, since version 4.0, has been trying to bridge this gap by allowing to offload a code block to a target device. Among the additions to the OpenMP offloading API since, the most notably probably is asynchronous execution between device and host. By default, offloaded regions are executed synchronously, thus the host thread blocks until their completion. The nowait clause allows work to overlap between the host and target device. However, nowait must be manually added by the user, along with the tasks data dependencies and appropriate synchronization to avoid race conditions, increasing the program complexity and developer burden.
    Workshop
    Automatic, Efficient, and Scalable Provenance Registration for FAIR HPC Workflows
    Recorded
    Cloud and Distributed Computing
    In Situ Processing
    Scientific Computing
    Workflows
    W
    DescriptionProvenance registration is becoming more and more important as we increase the size and number of experiments performed using computers. In particular, when provenance is recorded in HPC environments, it must be efficient and scalable. We propose a provenance registration method for scientific workflows, efficient enough to run in supercomputers (thus, it could run in other environments with more relaxed restrictions, such as distributed ones). It also must be scalable in order to deal with large workflows, that are more typically used in HPC. We also target transparency for the user, shielding them from having to specify how provenance must be recorded. We implement our design using the COMPSs programming model as a Workflow Management System (WfMS) and use RO-Crate as a well-established standard to record provenance. Experiments are provided, demonstrating the efficiency and scalability of our solution.
    Workshop
    Autoscaling of Containerized HPC Clusters in the Cloud
    Recorded
    W
    DescriptionThis presentation introduces a Cloud orchestrator controller that enables the autoscaling of containerized HPC Clusters in the Cloud. This controller triggers the creation or suppression of containerized HPC compute nodes according to metrics collected at the containerized HPC scheduler’s job queue level. Our approach does not modify either the Cloud orchestrator or HPC scheduler. The scheme followed is generic and can be applied to every HPC schedulers. Moreover, the containerization extends the experimentation reproducibility by the addition of the HPC scheduler itself to the environment replayed by the end user. The presentation exemplifies Cloud and HPC convergence to allow a high degree of flexibility for users and community platform developers. It also explores continuous integration/deployment approaches of Cloud computing to orchestrate multiple and potentially different HPC job schedulers that scale under the supervision of the Cloud orchestrator.
    Birds of a Feather
    Benchmarking across HPC Architectures
    TP
    XO/EX
    DescriptionHPC centers around the world use benchmarks to evaluate their machines and to engage with vendors during procurement. The goal of this BoF is twofold. First, a series of short presentations will gather information on the state of the art methodologies for creating and validating the benchmarking sets. Second, an open discussion will gather community feedback on pitfalls of the current methodologies and how these methodologies should evolve to accommodate the growing diversity of the computational workloads and HPC architectures. The intended audience is HPC application developers and users, teams benchmarking HPC data centers, HPC vendors, and performance researchers.

    Meeting_notes
    Workshop
    Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream
    Recorded
    Applications
    Architectures
    Benchmarking
    Exascale Computing
    Modeling and Simulation
    Performance
    Performance Portability
    W
    DescriptionFortran DO CONCURRENT has emerged as a new way to achieve parallel execution of loops on CPUs and GPUs. This paper studies the performance portability of this construct on a range of processors and compares it with the incumbent models: OpenMP, OpenACC and CUDA. To do this study fairly, we implemented the BabelStream memory bandwidth benchmark from scratch, entirely in modern Fortran, for all of the models considered, which include Fortran DO CONCURRENT, as well as two variants of OpenACC, four variants of OpenMP (2 CPU and 2 GPU), CUDA Fortran, and both loop- and array-based references. BabelStream Fortran matches the C++ implementation as closely as possible, and can be used to make language-based comparisons. This paper represents one of the first detailed studies of the performance of Fortran support on heterogeneous architectures; we include results for AArch64 and x86_64 CPUs as well as AMD, Intel and NVIDIA GPU platforms.
    Tutorial
    Best Practices for HPC in the Cloud
    Recorded
    Cloud and Distributed Computing
    Containers
    Datacenter
    Productivity Tools
    Resource Management and Scheduling
    Software Engineering
    TUT
    DescriptionCloud computing technologies use has grown considerably in HPC during the last few years. The complexity and scale that comes with cloud environments can make the first experience a daunting proposition. Cloud technologies offer a number of new capabilities to streamline tasks for HPC users and administrators. However, how to use these in HPC may not be immediately clear.

    This tutorial provides a foundation to run HPC workloads in the cloud. It is organized in four series of progressive lectures and labs that provides a hands-on learning experience. It starts with a primer on cloud foundations and how they map to common HPC concepts, dives deeper into cloud core components, and presents the best practices to run HPC in the cloud.

    This tutorial uses a combination of lectures and hands-on labs on provided temporary Amazon Web Services (AWS) accounts to provide both conceptual and hands-on learning.
    Workshop
    Best Practices for HPC Training and Education – Morning Break
    Recorded
    W
    Exhibitor Forum
    Best Practices for Running HPC on Google Cloud
    Recorded
    TP
    XO/EX
    DescriptionJoin this technical deep dive into Google Cloud’s latest high-performance computing (HPC) advancements, covering the latest VMs, processors, accelerators, and storage solutions. We’ll also discuss our new HPC tools for deploying and managing your HPC environments, and how our customers are benefiting from running their HPC in the cloud.
    Birds of a Feather
    Best Practices for Training an Exascale Workforce Using Applied Hackathons and Bootcamps
    TP
    XO/EX
    DescriptionGiven the anticipated growth of the HPC market, HPC is challenged with expanding the size, diversity, and skill of its workforce. As we move toward exascale computing, how best do we prepare future computational scientists, and enable established domain researchers to stay current and master tools needed for exascale architectures?

    This BoF invites scientists, researchers, trainers, educators, and the RSEs that support them to discuss current learning and development programs, explore adding in-person and virtual hackathons to existing training modalities, and brainstorm implementation strategies to bridge between traditional programming curricula and hands-on skills needed by diverse communities within different environments.
    Tutorial
    Better Scientific Software
    Recorded
    Applications
    Computational Science
    Productivity Tools
    Software Engineering
    TUT
    DescriptionProducing scientific software is a challenge. The high-performance modeling and simulation community, in particular, faces the confluence of disruptive changes in computing architectures and new opportunities (and demands) for greatly improved simulation capabilities, especially through coupling physics and scales. Simultaneously, computational science and engineering (CSE), as well as other areas of science, are experiencing an increasing focus on scientific reproducibility and software quality.

    Computer architecture changes require new software design and implementation strategies, including significant refactoring of existing code. Reproducibility demands require more rigor across the entire software endeavor. Code coupling requires aggregate team interactions including integration of software processes and practices. These challenges demand large investments in scientific software development and improved practices. Focusing on improved developer productivity and software sustainability is both urgent and essential.

    This tutorial will provide information about software practices, processes, and tools explicitly tailored for CSE and HPC. Goals are improving the productivity of those who develop CSE software, increasing the sustainability of software artifacts, and trustworthiness in their use. Topics include the software processes for (small) teams, including agile processes, collaboration via version control workflows, reproducibility, and scientific software design, refactoring, and testing (including test design strategies and continuous integration).
    Invited Talk
    Biology Is All You Need
    Recorded
    TP
    XO/EX
    DescriptionStorage and compute technologies are no longer improving at pace with exponentially growing global demand. The world’s largest data storage stakeholders already face hard choices about what data to keep in the face of limited capacity, and compute stakeholders are rapidly approaching the resource scaling limits of massive data centers for training the largest AI models.

    Biology offers a guide for solving these problems. Living systems store information in DNA with extraordinary density, enough to store all the world’s data in one small room. Living systems also implement natural intelligence – still an aspirational goal for AI – using low-power neural circuit “wetware” that fits between our ears. If we can understand and exploit these capabilities, we can overcome the scaling issues facing the HPC field.

    In this talk, I will describe IARPA’s high-risk, high-payoff research programs to address fundamental problems in storage and computing using biology as a guide. This includes the Molecular Information Storage (MIST) program, which is developing DNA data storage technologies that will eventually allow us to store exabytes of data in a tabletop form factor, and the Machine Intelligence from Cortical Networks (MICrONS) program, which has densely mapped the structure and function of neural circuits to guide the development of next-generation computing architectures.
    Paper
    Blaze: Fast Graph Processing on Fast SSDs
    Recorded
    Big Data
    Computational Science
    TP
    DescriptionOut-of-core graph processing is an attractive solution for processing very large graphs that do not fit in the memory of a single machine. The new class of ultra-low-latency SSDs should expand the impact and utility of out-of-core graph processing systems. However, current out-of-core systems cannot fully leverage the high IOPS these devices can deliver.

    We introduce Blaze, a new out-of-core graph processing system optimized for ultra-low-latency SSDs. Blaze offers high-performance out-of-core graph analytics by constantly saturating these fast SSDs with a new scatter-gather technique called online binning that allows value propagation among graph vertices without atomic synchronization. Blaze offers succinct APIs to allow programmers to write efficient out-of-core graph algorithms without the burden to manage complex IO executions. Our evaluation shows that Blaze outperforms current out-of-core systems by a wide margin on six datasets and a set of representative graph queries on Intel Optane SSD.
    Workshop
    Blending Accelerated Programming Models in the Face of Increasing Hardware Diversity
    Recorded
    W
    DescriptionThe choice of programming model for accelerated computing applications depends on a wide range of factors, which weigh differently across application domains, institutions, and even countries. Why does one application use standard programming languages like C++, while another uses embedded programming models like Kokkos or directives such as OpenACC, and yet another directly programs in vendor-specific languages like CUDA or HIP? This panel will work through a comparison of the various choices, and share hands-on experience from developers in different countries and fields of expertise. We’ll explore both technical and non-technical reasons for how the various approaches are mixed. Join us for a fun and insightful session!
    Workshop
    Blocking Sparse Matrices to Leverage Dense-Specific Multiplication
    Recorded
    Accelerator-based Architectures
    Algorithms
    Architectures
    Big Data
    Data Analytics
    Parallel Programming Languages and Models
    Productivity Tools
    W
    DescriptionResearch to accelerate matrix multiplication, pushed by the growing computational demands of deep learning, has sprouted many efficient architectural solutions, such as NVIDIA’s Tensor Cores. These accelerators are designed to process efficiently a high volume of small dense matrix products in parallel. However, it is not obvious how to leverage these accelerators for sparse matrix multiplication. A natural way to adapt the accelerators to this problem is to divide the matrix into small blocks, and then multiply only the nonzero blocks. In this paper, we investigate ways to reorder the rows of a sparse matrix to reduce the number of nonzero blocks and cluster the nonzero elements into a few dense blocks. While this pre-processing can be computationally expensive, we show that the high speed-up provided by the accelerators can easily repay the cost, especially when several multiplications follow one reordering.
    Workshop
    Blue Waters Education, Outreach, and Training Impact
    Recorded
    HPC Training and Education
    W
    DescriptionThe Blue Waters project pursued activities focused on national scale education, outreach, and training. The activities began in 2009. During 2022, the final year of the project, the team is focused on documenting the impact on the national community, lessons learned, and recommendations for programs that adopt/adapt similar activities.

    The presentation to the attendees at this workshop will include the impact, lessons learned, and recommendations based on our experiences. If accepted, a full paper will be submitted for publication in the Journal of Computational Science Education that will expand upon the information provided in the presentation.
    Paper
    Boosting Performance Optimization with Interactive Data Movement Visualization
    Recorded
    Accelerator-based Architectures
    Performance
    Visualization
    TP
    DescriptionOptimizing application performance in today's hardware architecture landscape is an important, but increasingly complex task, often requiring detailed performance analyses. In particular, data movement and reuse play a crucial role in optimization and are often hard to improve without detailed program inspection. Performance visualizations can assist in the diagnosis of performance problems, but generally rely on data gathered through lengthy program executions. In this paper, we present a performance visualization geared toward analyzing data movement and reuse to inform impactful optimization decisions, without requiring program execution. We propose an approach that combines static dataflow analysis with parameterized program simulations to analyze both global data movement and fine-grained data access and reuse behavior, and visualize insights in-situ on the program representation. Case studies analyzing and optimizing real-world applications demonstrate our tool's effectiveness in guiding optimization decisions and making the performance tuning process more interactive.
    Students@SC
    Break
    Exhibitor Forum
    Breaking Down Barriers in HPC with the OpenFlightHPC Open-Source Project
    Recorded
    TP
    XO/EX
    DescriptionAs High Performance Computing (HPC) moves from a specialist science to an everyday commodity, there is still an unreasonably large barrier to entry for new users. Traditionally, getting access to HPC resources is both expensive and time consuming, and once you get access, moving between clusters is equally as cumbersome.

    The Alces Flight team has experimented with various concepts in the pursuit of the question, “How can we lower the barrier to entry for HPC users?” Starting in 2015, the team explored a free subscription model and the impact/usage by an individual user on public cloud, from which the base knowledge of the OpenFlightHPC open-source project emerged in 2019.

    OpenFlightHPC is an open-source community developing a flexible, functional and stable HPC stack that can be launched on any platform. The project provides the knowledge and toolsets needed for HPC environment creation in a manner that anyone with basic-level HPC experience can utilize. The toolset assists in helping to create more portable HPC environments using process standardization to promote free interchange of knowledge for shared benefit.

    This presentation covers:
    - The importance of learning through experimentation and successful failures.
    - The community and cultural shifts in people, skills, and sustainability that are feeding the need for greater flexibility in HPC.
    - How OpenFlightHPC works, including bare-metal and cloud deployment techniques, process automation using tools including Ansible and Salt, and portability of workloads both in container and shared environments.
    Workshop
    Bridging the Gap between Education and Research: A Retrospective on Simulating an HPC Conference
    Recorded
    W
    DescriptionHigh Performance Computing (HPC) is playing an increasingly important role in industry, research and everyday life. Moreover, a central core of the European HPC strategy is the Modular Supercomputing Architecture (MSA), which breaks with traditional HPC architectures by integrating heterogeneous computing resources in system-level modules. Nevertheless, HPC and especially MSA content only rarely find their way into the curriculum of computer science courses at German universities. In addition, the necessary competencies for independent scientific research are hardly addressed, although these skills are essential for students for writing their final theses.

    We present a blended learning based module concept that promotes the understanding and application of modular supercomputing while connecting it with the techniques of scientific project work. The module was first implemented at Goethe University in Summer 2022. The initial feedback and evaluation results are quite encouraging both in terms of learning outcomes and student engagement and interest.
    Birds of a Feather
    Bridging the HPC/Data Divide
    TP
    XO/EX
    DescriptionScientific advances designed to address global challenges require researchers to have seamless access to data and computing and increasingly high performance computing. A certain disconnect has characterized the relationship between the HPC and data communities and this needs to be addressed in order to fully support today's data and compute intensive science. An open exploration of the sociotechnical and technical differences between the two communities, as well as describing any open challenges towards closer collaboration will be discussed. One BoF outcome is to draw in ‘HPC-oriented’ colleagues who wish to learn more or be more aligned with the data community.
    Exhibitor Forum
    Bringing AI and HPC Workloads to the Cloud-Native World of Kubernetes
    Recorded
    TP
    XO/EX
    DescriptionKubernetes has become the de-facto tool for orchestrating containerized workloads, and AI workloads are no different. But can an orchestrator built for long-running (micro)-services meet the needs of research experimentation and simulations? Can IT easily incorporate K8s into their AI & HPC workflows?

    Join Gijsbert Janssen van Doorn of Run:ai for a crash course in Kubernetes for AI & HPC. Learn what’s working, what’s not, and some fixes for supporting these demanding environments with K8s.

    In this session we will:

    - Explain how and why Kubernetes is the top choice for AI & HPC workloads
    - See where Kubernetes is challenged when it comes to AI & HPC workloads
    - See how using GPUs instead of CPUs can accelerate your development cycles
    Workshop
    Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX
    Recorded
    AI-HPC Convergence
    Extreme Scale Computing
    Parallel Programming Languages and Models
    Performance
    Runtime Systems
    W
    DescriptionAPEX (Autonomic Performance Environment for eXascale) is a performance measurement library for distributed, asynchronous multitasking runtime systems. It provides support for both lightweight measurement and high concurrency. To support performance measurement in systems that employ user-level threading, APEX uses a dependency chain in addition to the call stack to produce traces and task dependency graphs. APEX also provides a runtime adaptation system based on the observed system performance. In this paper, we describe the evolution of APEX from its design for HPX to support an array of programming models and abstraction layers and describe some of the features that have evolved to help understand the asynchrony and high concurrency of asynchronous tasking models.
    Workshop
    BTS: Exploring Effects of Background Task-Aware Scheduling for Key-Value CSDs
    Recorded
    W
    DescriptionThe computational storage device (CSD) must aid background tasks for the storage service applications (background tasks) without harming user I/O performance (foreground I/O). However, in practice, SPDK often increases foreground I/O latencies and under-utilizes CPU cores in the CSD. These problems proceed from allocating foreground I/Os and background tasks to the same CPU core because SPDK processes them as the same request without distinguishing them. To tackle this, we propose a Background Task-aware Scheduler (BTS) for CSDs built using SPDK. BTS solves the following problems: (i) idle CPU cores in the CSD are not used, and (ii) the latency of foreground I/O increases due to interference with background tasks. For evaluation, we implemented a key-value interface CSD using SPDK. With BTS, the results show that idle CPUs are used to process background tasks by guaranteeing the low latency of foreground I/O when the background tasks are set to deduplication.
    Paper
    Building Blocks for Network-Accelerated Distributed File Systems
    Recorded
    Architectures
    Networks
    TP
    Best Paper Finalist
    DescriptionHigh-performance clusters and datacenters pose increasingly demanding requirements on storage systems. If these systems do not operate at scale, applications are doomed to become I/O bound and waste compute cycles. To accelerate the data path to remote storage nodes, remote direct memory access (RDMA) has been embraced by storage systems to let data flow from the network to storage targets, reducing overall latency and CPU utilization. Yet, this approach still involves CPUs on the data path to enforce storage policies such as authentication, replication, and erasure coding. We show how storage policies can be offloaded to fully programmable SmartNICs, without involving host CPUs. By using PsPIN, an open-hardware SmartNIC, we show latency improvements for writes (up to 2x), data replication (up to 2x), and erasure coding (up to 2x), when compared to respective CPU- and RDMA-based alternatives.
    Exhibitor Forum
    Building Solutions to Solve the World’s Hardest Problems
    Recorded
    TP
    XO/EX
    DescriptionIn this session, we’ll paint a picture for removing infrastructure constraints to solve complex computational problems. Imagine agile scalable infrastructure with no fixed assets, and no waiting in the queue to start jobs. We’ll share progress on an extraordinary project using Virtual Flow to do extreme scale screening, and computational drug discovery at scale. Together with academic researchers and partners, we’ve built out a 5-10 billion molecular database to identify targets, using 2.2 million virtual CPUs. Learn how the most vexing societal problems of our generation will be solved through what we at AWS call Impact Computing.
    Workshop
    Building the computing infrastructure of a modern scientific user facility
    Recorded
    W
    Workshop
    Building User-Facing Platforms with Container Orchestration
    Recorded
    Benchmarking
    Cloud and Distributed Computing
    Containers
    Datacenter
    Networks
    Privacy
    Resource Management and Scheduling
    Security
    SIGHPC
    State of the Practice
    System Administration
    System Software
    W
    DescriptionHigh performance computing has always offered batch computing services but demand is growing for a wider range of workflow and data services. Container orchestration is a perfect candidate for offering scheduling services for these types of workloads in a similar way. By leveraging container orchestration with Kubernetes, you can build a platform that includes both a service catalog and lets users run their own containerized services directly.

    The power of such a platform is being able to stand on the shoulders of giants. This starts with leveraging Kubernetes for container orchestration and running these types of workloads. Next is using the internal Kubernetes’ paradigms with Operators to provide higher level scheduling of specific types of applications to create a service catalog. Third is using the Kubernetes API to tie everything together under a single user experience.
    Exhibitor Forum
    C Band + L Band
    Recorded
    TP
    XO/EX
    DescriptionWhile most deployed networks today use C-Band, the L-Band has been available for decades and is also deployed on Dispersion Shifted Fiber. Using both (C+L) doubles the capacity per fiber pair, but requires additional equipment to be added to an in-service traffic bearing system and yields less than optimal performance due to band interaction of separate amplifiers. New C&L Band systems are being designed with fewer components and provide better performance by lighting the entire spectrum day one for lower cost per bit and superior reach. Hear when, where, and why Verizon is pushing the development of this new technology for its nationwide long haul network.
    Job Posting
    C++/Java Engineer
    DescriptionPassionate about the protection of critical and high value targets? Join our dynamic team and make a difference providing creative solutions to unique national security challenges!

    We are seeking R&D Computer Science professionals to join highly productive teams that research and develop innovative solutions to a broad spectrum of problems of national importance. In this role, you will collaborate in an innovating environment to architect, design, develop, test, and deploy modern data processing software for sophisticated, real-time decision support systems!

    On any given day, you may be called on to:

    Work on a team developing software systems addressing exciting remote sensing problems, including the capture, processing, exploitation, visualization, and distribution of real-time satellite sensor data.

    Collaborate with architects, developers, technical leads, customers, and end users to collect requirements, design solutions, and deliver extensible software applications.

    Engage with diverse specialists in areas such as data fusion, signal and image processing, analytics, cloud computing, machine learning, modeling and simulation, service-oriented architectures, data management and visualization, and pattern recognition.

    Applicants on this posting may interviewed and hired by multiple organizations within Center 6300.

    Due to the nature of the work, the selected applicant must be able to work onsite.

    Join our team and achieve your goals while making a difference!
    Paper
    CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication
    Recorded
    Numerical Algorithms
    Scientific Computing
    TP
    DescriptionThis paper presents the Communication-Avoiding 3D Matrix Multiplication (CA3DMM) algorithm, a simple and novel algorithm that has optimal or near-optimal communication cost. CA3DMM is based on a unified view of parallel matrix multiplication. Such a view generalizes 1D, 2D, and 3D matrix multiplication algorithms to reduce the data exchange volume for different shapes of input matrices. CA3DMM further minimizes the actual communication costs by carefully organizing its communication patterns. CA3DMM is much simpler than some other generalized 3D algorithms, and CA3DMM does not require low-level optimization. Numerical experiments show that CA3DMM has good parallel scalability and has similar or better performance when compared to state-of-the-art PGEMM implementations for a wide range of matrix dimensions and number of processes.
    Workshop
    CAFCW22 – Morning Break
    Recorded
    W
    Workshop
    CAFCW22 – Welcome
    Recorded
    W
    Workshop
    Caffeine: CoArray Fortran Framework of Efficient Interfaces to Network Environments
    Recorded
    W
    DescriptionThis paper provides an introduction to the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine), a parallel runtime library built atop the GASNet-EX exascale networking library. Caffeine leverages several non-parallel Fortran features to write type- and rank-agnostic interfaces and corresponding procedure definitions that support parallel Fortran 2018 features, including communication, collective operations, and related services. One major goal is to develop a runtime library that can eventually be considered for adoption by LLVM Flang, enabling that compiler to support the parallel features of Fortran.

    The paper describes the motivations behind Caffeine's design and implementation decisions, details the current state of Caffeine's development, and previews future work. We explain how the design and implementation offer benefits related to software sustainability by lowering the barrier to user contributions, reducing complexity through the use of Fortran 2018 C-interoperability features, and high performance through the use of a lightweight communication substrate.
    Workshop
    CAMP: a Synthetic Micro-Benchmark for Assessing Deep Memory Hierarchies
    Recorded
    Algorithms
    Architectures
    Compilers
    Computational Science
    Exascale Computing
    Heterogeneous Systems
    Hierarchical Parallelism
    Memory Systems
    Parallel Programming Languages and Models
    Parallel Programming Systems
    Resource Management and Scheduling
    W
    DescriptionWe present the open-source CAMP tool for assessing deep memory hierarchies through performance measurements of synthetic kernels. CAMP provides different access patterns and allows to vary kernels' operational intensities. We describe the tool's design and implementation, and analyse measurements on a compute node of ARCHER2, the UK national supercomputer and compare it to measurements on a compute node on NEXTGenIO. We report results of a strong scaling study of contiguous, strided and stencil access patterns for various operational intensities and explore thread placement options and data sizes. The results confirm that bandwidth saturation can be achieved with a relatively small number of threads on AMD Rome and that underpopulation may be beneficial as performance drops when the node is fully populated for configurations with lower operational intensity, whilst the effect is less pronounced on the less hierarchical Intel Cascade Lake. Finally we discuss sub-NUMA-node awareness and directions for extending CAMP.
    Paper
    Canary: Fault-Tolerant FaaS for Stateful Time-Sensitive Applications
    Recorded
    Cloud and Distributed Computing
    TP
    DescriptionFunction-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful applications have been migrated to FaaS platforms due to their ease of deployment, scalability, and minimal management overhead. However, failures in FaaS have not been thoroughly investigated, thus making these desirable platforms unreliable for guaranteeing function execution and ensuring performance requirements. In this paper, we propose Canary, a highly resilient and fault-tolerant framework for FaaS that mitigates the impact of failures and reduces the overhead of function restart. Canary utilizes replicated container runtimes and application-level checkpoints to reduce application recovery time over FaaS platforms. Our evaluations using representative stateful FaaS applications show that Canary reduces the application recovery time and dollar cost by up to 83% and 12%, respectively over the default retry-based strategy. Moreover, it improves application availability with an additional average execution time and cost overhead of 14% and 8%, respectively, as compared to the ideal failure-free execution.
    Invited Talk
    CANCELLED: Specialization and the End of Moore’s Law
    Recorded
    TP
    XO/EX
    DescriptionFor decades, Moore’s Law made the economics of specialized chips unattractive because the upfront costs couldn’t be justified when the alternative was fast-improving CPUs. As Moore’s Law fades, however, this is changing. Not only is specialization becoming more economically attractive, but it is now one of the best ways to get performance improvements for many applications. In this talk, I will discuss (1) how the economics of specialization have changed, (2) how specialization is fracturing computing in ways commonly seen in other technologies, and (3) how long we can expect the gains from specialization to make up for the slowdown in Moore’s Law.
    Posters
    Research Posters
    CANDY: An Efficient Framework for Updating Properties on Large Scale Dynamic Networks
    TP
    XO/EX
    DescriptionQueries on large graphs use the stored graph properties to generate responses. As most of the real-world graphs are dynamic, i.e., the graph topology changes with time, and hence the related graph properties are also time-varying. In such cases, maintaining correctness in stored graph properties requires recomputation or update on previous properties. Here, we present an efficient framework, CANDY for updating the properties in large dynamic networks. We prove the efficacy of our general framework by applying it to update graph properties such as Single Source Shortest Path (SSSP), Vertex Coloring, and PageRank. Empirically we show that our shared-memory parallel and NVIDIA GPU-based data-parallel implementations perform better than the state-of-the-art implementations.
    Workshop
    CANOPIE – Afternoon Break
    Recorded
    W
    Workshop
    CANOPIE – Introduction
    Recorded
    W
    Workshop
    CANOPIE – Lunch Break
    Recorded
    W
    Workshop
    CANOPIE – Morning Break
    Recorded
    W
    Birds of a Feather
    Carbon-Neutrality and HPC
    TP
    XO/EX
    DescriptionData centers consume nearly 1% of global electricity demand, contributing to 0.3% of all global CO2 emissions and this is expected to rise without proactive steps. Tempting as it may be to point the finger at big tech, the truth is that users of various sizes all have had a hand in the increase in data centers’ workloads. How can the HPC community do our part to drive down greenhouse gas emissions without sacrificing the computing power needed to support our mission and services as promised?
    Workshop
    CardioHPC: Serverless Approaches for Real-Time Heart Monitoring of Thousands of Patients
    Recorded
    Cloud and Distributed Computing
    In Situ Processing
    Scientific Computing
    Workflows
    W
    DescriptionWe analyze a heart monitoring center for patients wearing electrocardiogram sensors outside hospitals. This prevents serious heart damages and increases life expectancy and health-care efficiency. In this paper, we address a problem to provide a scalable infrastructure for the real-time processing scenario for at least 10,000 patients simultaneously, and efficient fast processing architecture for the postponed scenario when patients upload data after realized measurements. CardioHPC is a project to realize a simulation of these two scenarios using digital signal processing algorithms and artificial intelligence-based detection and classification software for automated reporting and alerting.

    We elaborate the challenges we met in experimenting with different serverless implementations: 1) container-based on Google Cloud Run, and 2) Function-as-a-Service (FaaS) on AWS Lambda. Experimental results present the effect of overhead in the request and transfer time, and speedup achieved by analyzing the response time and throughput on both container-based and FaaS implementations as serverless workflows.
    Students@SC
    Careers in HPC Panel
    DescriptionThere are so many unique opportunities in HPC! While many core technical skills are applicable across a wide range of careers, there are also a lot of important differences. This panel brings together representatives from diverse career paths including industry, academia, and research labs. Come learn about the differences and similarities, and gain insight regarding the path that is best for you!
    Posters
    Research Posters
    Case Study for Performance-Portability of Lattice Boltzmann Kernels
    TP
    XO/EX
    DescriptionIn this work, we study the performance-portability of offloaded lattice Boltzmann kernels and the trade-off between portability and efficiency. The study is based on a proxy application for the lattice Boltzmann method (LBM). The performance portability programming framework of Kokkos (with CUDA or SYCL backend) is used and compared with programming models of native CUDA and native SYCL. The Kokkos library supports the mainstream GPU products in the market. The performance of the code can vary with accelerating models, number of GPUs, scale of the problem, propagation patterns and architectures. Both Kokkos library and CUDA toolkit are studied on the supercomputer of ThetaGPU (Argonne Leadership Computing Facility). It is found that Kokkos (CUDA) has almost the same performance as native CUDA. The automatic data and kernel management in Kokkos may sacrifice the efficiency, but the parallelization parameters can also be tuned by Kokkos to optimize the performances.
    Student Cluster Competition
    catcat
    TP
    XO/EX
    DescriptionWe are proficient in distributed system and parallel computing, algorithm optimization, computer operating system, and other HPC necessary knowledge and have participated in a large number of related researches or projects. Expect the basic knowledge necessary for supercomputers, our team also has very wide-ranging expertise. Bo-Luo Ge has solid knowledge of computer operation and maintenance, network knowledge, and computer system knowledge. He is the main member of the cluster operation and maintenance of the CUHKsz supercomputer club. Zi-Fan Liu has a deep knowledge of reinforcement learning and deep learning and also has done some research on the application of reinforcement learning in Smart Grid. Yi-Liang He has profound compiler-level insights and is excellent at simd and risc-v. Si-Wei Zhang has unique comprehension of the underlying compilation support and has done related research in the CUHKsz laboratory. Bo-Luo Ge, Yi-Liang He, Si-Wei Zhang, and Zi-Fan Liu have also participated in the ASC of 2021 and won second prize. Yang-Lin Zhang has solid knowledge of Computer Vision and has done some jobs in GPU parallel threading. Hao-Nan Xue has been involved in many hardware-related projects.

    Except for the professional computer domain knowledge, our team also has wide non-computer domain knowledge, such as Econometrics, Electricity Grid, Operation Management, Data Mining, etc. The diversity of our directions gives us th