Dealing With Unmanageable Data Sizes in Cosmology
DescriptionModern sky surveys conducted by powerful telescopes are some of the largest data generators in science, quite popular and visible to the broader audience. However, what goes largely unnoticed is the fact that cosmological simulations often have to produce even larger data sets in order to scientifically interpret these observations: we need many models employing different plausible physics, as well as to ensure that statistical errors of our predictions are less than observational errors. To maintain desired accuracy, modern simulations track the time evolution of trillions of elements over thousands of timesteps. For such large runs, storing a large number of time steps for later analysis is not a viable strategy anymore, and beyond-exascale forecasts point to growth in flops continually outpacing growth of disk space as well as network bandwidth, making the post processing strategy increasingly impossible. In this talk, I will go over the difficulties we are facing with large data sizes and which present major technological roadblock. Then I will present some of our existing lines of attack on this problem, including different compression methods, surrogate modeling, as well as our design for running multiple codes in situ: using coroutines and position independent executables we enable cooperative multitasking between simulation and analysis, allowing the same executables to post-process simulation output, as well as to process it on the fly, both in situ and in transit.
Event Type
Workshop
TimeSunday, 13 November 20228:35am - 9:35am CST
LocationC141
W
Recorded