Authors: John Gounley (Oak Ridge National Laboratory (ORNL)), Thomas Brettin (Argonne National Laboratory (ANL)), Adam Moody (Lawrence Livermore National Laboratory)
Abstract: Transformers and other large language models have shown impressive capabilities as 'foundation' models for domains such as natural language processing and computer vision. Requiring huge amounts of scalable compute, self-supervised training on large datasets is leveraged to develop models applicable to a variety of specialized tasks. Recent efforts in areas such as bioinformatics and protein folding indicate the significant potential for Transformer models in domain science applications. In this session, presenters and attendees have the opportunity to discuss new algorithms, software, or hardware for training Transformers on large domain science datasets and novel ideas for applying Transformers in this space.
Long Description: Transformers and other large language models have shown impressive capabilities as `foundation' models for domains such as natural language processing and computer vision. Requiring huge amounts of scalable compute, self-supervised training on large datasets is leveraged to develop models applicable to a variety of specialized tasks. Recent efforts in areas such as bioinformatics and protein folding indicate the significant potential for Transformer models in domain science applications as well.
However, designing and training Transformer models to address scientific questions presents several challenges to the HPC community. First, new algorithms will be needed to adapt the Transformer architecture to the different characteristics of scientific data and domain science problems. For example, emerging methods such as sparse attention can facilitate working with long input sequences while custom tokenization schemes will advance working with new input data types (e.g., omics sequences instead of standard text). Second, training Transformer models for science will require software and hardware innovations so that the compute and energy costs do not become prohibitive for non-commercial enterprises. Enormous progress has been made on software backends to train large Transformers at scale on pre-exascale systems, but these will need to keep pace with increasingly large Transformer models and ever more complex architectures. Third, the two previous challenges suggest a clear need for the development of multi-institutional collaborations with teams from industry, compute facilities, and domain sciences coming together to train domain-focused Transformer models. Given that the very purpose of foundation models is to be applicable to multiple tasks with limited fine-tuning, this is a natural space for broad collaborations and the success of the recent BigScience Workshop suggests the potential of the approach in this space.
The session will have two components. For the first 20 minutes, a series of lightning style (i.e., single slide) presentations will be used to introduce a series of current lines of inquiry related to Transformers for Science. For the remainder of the session, we will have a general discussion focusing both on questions provoked by presentations as well as ideas introduced by the audience.
The expected outcome of this session will be 1) an introduction to and a lively discussion of the prospects and challenges to applying Transformer models to scientific problems and 2) bringing together folks from disparate parts of the HPC community who are interested in Transformers and planting the seeds of future collaborations.
Back to Birds of a Feather Archive Listing