Workshop: PDSW22: 7th International Parallel Data Systems Workshop
Authors: Radita Liem (RWTH Aachen University, IT Center) and Shadi Ibrahim (French Institute for Research in Computer Science and Automation (INRIA))
Abstract: In this work in progress, we will showcase a comprehensive analysis of the current state-of-the-art solutions for data skew mitigation in several environments. Our experiments and evaluation comprise several data-intensive workflows running on Spark using the Grid’5000 testbed. The data-intensive workflows vary from a highly optimized WordCount application, an iterative application like PageRank, to an SQL-based decision support system benchmark, TPC-H with various sizes and configurations. Going forward, we will discuss our current efforts toward heterogeneity-aware multi-stages data partitioning.
Back to PDSW22: 7th International Parallel Data Systems Workshop Archive Listing