Authors: Tal Ben-Nun (ETH Zürich); Linus Groner (Swiss National Supercomputing Centre (CSCS)); Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver D. Elbert, Rhea George, and Jeremy McGibbon (Allen Institute for Artificial Intelligence); Lukas Trümper (ETH Zürich); Elynn Wu and Oliver Fuhrer (Allen Institute for Artificial Intelligence); Thomas Schulthess (Swiss National Supercomputing Centre (CSCS)); and Torsten Hoefler (ETH Zürich)
Abstract: Earth system models are developed with a tight coupling to target hardware, often containing specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout.
We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance. By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we scale to 2,400 GPUs, achieving speedups of up to 3.92x over the tuned production implementation at a fraction of the original code.
Back to Technical Papers Archive Listing