Science

Research team accelerates multi-physics simulations with El Capitan predecessor systems

A 2D MARBL simulation of the N210808 “Burning Plasma” shot carried out on the Nationwide Ignition Facility on the onset of ignition. This calculation consists of 19 million high-order quadrature factors and ran on El Capitan predecessor system rzAdams (on AMD MI300A GPUs). Credit: Rob Rieben.

Researchers at Lawrence Livermore Nationwide Laboratory (LLNL) have achieved a milestone in accelerating and including options to advanced multi-physics simulations run on Graphics Processing Models (GPUs), a growth that might advance high-performance computing and engineering.

As LLNL readies for El Capitan, the Nationwide Nuclear Safety Administration’s first exascale supercomputer, the staff’s efforts have centered across the growth of MARBL, a next-generation multi-physics code, for GPUs. El Capitan relies on AMD’s cutting-edge MI300A Accelerated Processing Models (APUs), which mixes Central Processing Models (CPUs) with GPUs and high-bandwidth reminiscence right into a single package deal, permitting for extra environment friendly useful resource sharing.

El Capitan’s heterogeneous (CPU/GPU) computing structure, together with expectations that the majority future supercomputers will probably be heterogeneous, made it crucial that multi-physics codes like MARBL—which targets mission-relevant high-energy-density (HED) physics like these concerned in inertial confinement fusion (ICF) experiments and stockpile stewardship functions—may carry out effectively throughout all kinds of architectures, researchers mentioned.

In a current paper published by the Journal of Fluids Engineering, by harnessing the facility of GPUs, particularly AMD’s MI250X GPUs in El Capitan’s early entry machines, the researchers efficiently prolonged MARBL’s capabilities to incorporate further physics essential for HED physics and fusion modeling.

“The big focus of this paper was supporting multi-physics—specifically multi-group radiation diffusion and thermonuclear burn, which are involved in fusion reactions—and the coupling of all of that with the higher-order finite-element moving mesh for simulating fluid motion,” principal investigator Rob Rieben mentioned.

“To get performance on the GPU, there is a lot you have to do in terms of programming, optimizing kernels, balancing memory, and turning your code into a GPU-parallel code, and we were able to accomplish that.”

Rieben’s staff has been devoted to engineering the scalable, GPU-accelerated multi-physics utility MARBL for simulating HED physics experimental platforms since 2015, specializing in the simultaneous development of software program abstractions and algorithmic developments to allow GPU efficiency.

The work described within the current paper is important for delivering on programmatic duties that rely closely on large-scale computational science to reply powerful nationwide safety questions, mentioned co-author Alejandro Campos, who added that the staff confronted two predominant challenges in extending MARBL’s capabilities: verifying that further physics modules had been precisely applied and guaranteeing that these new modules may carry out effectively when working on the following technology of GPU-based machines.

Researchers mentioned the staff addressed these challenges via strategies comparable to new algorithms for fixing linear systems with preconditioners, which have traditionally been optimized for CPUs. A breakthrough from LLNL’s Heart for Utilized Scientific Computing (CASC) led to a brand new sort of preconditioner fitted to GPUs, which was built-in into the code and scaled up for manufacturing use.

Preconditioners for linear solvers have been difficult to port to GPUs in a performant manner, Rieben mentioned. “CASC proposed a brand new sort of preconditioner wanted for fixing diffusion equations that’s particularly designed to supply excessive efficiency for high-order strategies on GPUs, which allow us to run massive 3D multi-physics simulations on GPU machines like El Capitan.

“Our job was to put their method into a production code, scale it up, and show that it works, not just on benchmarks, but on the actual problems that we care about. We took that hot-off-the-presses research, worked with the researchers in CASC, and got it into our code and did all the necessary tuning to make that perform well on multiple GPU systems,” Rieben mentioned.

Within the paper, the staff in contrast conventional distributed CPU approaches to the speedy computing enabled by GPU architectures and centered on growing software program that might successfully make the most of the Single Instruction/A number of Knowledge paradigm of GPU {hardware}. The multi-physics nature of the simulations launched bottlenecks that added complexity to the duty, which may degrade total efficiency and scalability if not correctly addressed, the staff reported.

Researchers mentioned the staff’s use of efficiency portability abstraction layers, such because the LLNL-developed RAJA Portability Suite and the MFEM finite factor discretization library, was instrumental in enabling MARBL’s single supply code to focus on a number of GPU/CPU architectures.

“In this paper, we focus on the AMD GPUs because we could leverage other open-source performance portability libraries developed here like RAJA,” co-author Tom Stitt mentioned. “While there were some AMD-specific changes that needed to be made, there weren’t that many, and they didn’t take that much time, so to start our performance portability strategy, that’s a win.”

Stitt added that getting MARBL to carry out on LLNL’s present CPU/GPU flagship Sierra took about six years of worker time versus about 4 months to attain efficiency on the El Capitan early-access techniques at an 18-fold productiveness enhance.






Credit: Lawrence Livermore Nationwide Laboratory

“If we had to invest that six years of time again for this new platform, we wouldn’t have succeeded; we’d still be working on it,” Stitt mentioned. “Our code successes show that the RAJA Portability Suite is a very viable option for writing codes that will work across CPU and GPU architectures and across different GPU vendors.”

Along with RAJA, Umpire—a programming interface that helped alleviate reminiscence constraints on Sierra—additionally has helped enhance codes for El Capitan, Stitt mentioned. Since El Capitan may have eight instances extra reminiscence per node than Sierra, researchers will be capable to match a lot greater issues on a single node and make the most of the parallelism that the AMD APUs can present, researchers mentioned.

“The MI300As are the next evolution in AMD GPU processors, and thus, we are very excited to carry out our simulations with those resources,” co-author Alejandro Campos mentioned. “We’ve relied on various libraries developed at LLNL, such as MFEM, RAJA, Umpire, and others, to abstract away some of the work that went into performance portability, and thus, we hope the transition for MARBL to the newer processors will be as straightforward as possible.”

Co-author Aaron Skinner mentioned prior strategies to run MARBL on CPU-based machines proved difficult on account of variations in structure. Recognizing these limitations, Skinner labored with different CASC researchers to develop code and algorithmic enhancements fitted to GPUs, an effort that has efficiently benefitted a number of physics modules.

“We’ve known for a while that we need matrix-free methods to gain performance on GPUs, but our best linear solvers don’t lend themselves easily to that formalism, if at all,” Skinner mentioned.

“With CASC, we’ve spent a lot of time implementing and optimizing those matrix-free methods, which have really paid off because the same linear solvers can be used across many different types of modules, including radiation diffusion, thermal conduction, and alpha-particle diffusion. Our approach uses a combination of code optimizations and algorithmic restructuring to gain performance in our linear solvers, which tend to make up the bulk of the computational workload.”

Researchers mentioned the profitable GPU acceleration for MARBL represents a leap ahead for high-performance computing and will have important implications, not only for El Capitan however for computational science total.

Enhancing efficiency portability will enhance flexibility whereas advancing GPU acceleration may result in extra environment friendly and correct simulations for real-world scientific issues in excessive power density physics—together with fusion power pushed by lasers or pulsed energy—and codes for aerospace and automotive engineering, supplies science, local weather, organic functions, and different advanced phenomena.

“Performance portability of codes like MARBL will allow for simulations that provide answers much more quickly or simulations that were previously too expensive to carry out even on the largest supercomputers, as it allows for seamless utilization of different GPU hardware without the need for extensive hardware-specific porting,” Campos mentioned.

Within the paper, the staff carried out scaling research on key physics benchmark issues to display the success of their method on numerous computing architectures, displaying the potential of GPU acceleration for high-order finite factor multi-physics simulations and highlighting the flexibility and flexibility of their efficiency portability method.

“The fact that we have a single source code that can target multiple GPUs from different vendors, that’s a really big deal,” Rieben mentioned. “At the DOE labs, one of our principles has been that we can’t afford to be locked into a specific vendor. That’s baked into how we develop our software, so this is a big win for us. It’s a big multiplier in terms of being able to run the code on as many platforms as we possibly can.”

Researchers mentioned they had been capable of run issues with MARBL on El Capitan’s early entry machines, during which the built-in CPU/GPUs share a single reminiscence area at about twice the velocity of Sierra and intention to succeed in an element of 5 instances or better on El Capitan’s superior MI300 APUs, and a 15- to 20-fold enhance over the Lab’s present quickest Commodity Know-how Programs.

Rieben mentioned quicker computation via GPUs instantly correlates with scientific discovery, as researchers study from working quite a few simulations reasonably than only one. Fast iteration at excessive decision permits customers to show round issues shortly, boosting productiveness. Moreover, the elevated computational energy LLNL will get with El Capitan will permit for larger-scale simulations that had been beforehand unattainable and lift the usual for simulation complexity.

“The ability to rapidly iterate at full fidelity and high resolution in 3D is crucial for efficient discovery,” Rieben mentioned. “That’s an immediate benefit; people can turn problems around that much faster. So, that speed increase directly translates into a productivity boost for the user.”

“The other thing it lets you do is, of course, scale, so now you can consider things at a scale that you wouldn’t have considered before. What was once considered cutting-edge will become more commonplace over time.”

Extra info:
Thomas Stitt et al, Efficiency Moveable Graphics Processing Unit Acceleration of a Excessive-Order Finite Factor Multiphysics Software, Journal of Fluids Engineering (2024). DOI: 10.1115/1.4064493

Quotation:
Research staff accelerates multi-physics simulations with El Capitan predecessor techniques (2024, April 24)
retrieved 25 April 2024
from https://techxplore.com/information/2024-04-team-multi-physics-simulations-el.html

This doc is topic to copyright. Other than any truthful dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for info functions solely.



Click Here To Join Our Telegram Channel


Source link

When you’ve got any considerations or complaints relating to this text, please tell us and the article will probably be eliminated quickly. 

Raise A Concern

Show More

Related Articles

Back to top button