Spark SQL is a Spark module for structured knowledge processing. It has been extensively deployed in business however it’s difficult to tune its efficiency.
Current machine studying tuning strategies are troublesome to use in observe as a result of excessive time price and failure to adapt to the modifications within the quantity of knowledge to be processed.
To handle these issues, a analysis group led by Prof. Yu Zhibin from the Shenzhen Institute of Superior Know-how (SIAT) of the Chinese language Academy of Sciences proposed a low-time-cost automated configuration optimization technique named Low-Overhead On-line Configuration Auto-Tuning (LOCAT), which may scale back the optimization time and enhance efficiency of Spark SQL.
The outcomes had been revealed at SIGMOD 2022, a world discussion board for database researchers, practitioners, builders, and customers. The related paper could be present in Proceedings of the 2022 Worldwide Convention on Administration of Information.
The researchers first designed question and configuration parameter sensitivity evaluation methods for LOCAT. Queries that had been insensitive to configuration parameters had been recognized and faraway from a given workload when coaching samples had been collected.
“For the remaining queries, LOCAT calculated correlation coefficients to identify important configuration parameters,” mentioned Prof. Yu. “Then, it applies kernel principal component analysis to cut back the dimension of configuration parameter search.”
Lastly, the researchers designed Bayesian optimization for LOCAT, which is conscious of the dataset measurement to seek for the optimum configuration in order that its efficiency could be robotically optimized primarily based on the scale of the dataset.
The experimental outcomes on the ARM cluster (a cluster of servers for large knowledge computing, during which every server makes use of CPU primarily based on the ARM instruction) confirmed that the LOCAT accelerated the optimization procedures of the state-of-the-art approaches by not less than 4.1x and as much as 9.7x. Furthermore, the LOCAT improved the appliance efficiency by not less than 1.9x and as much as 2.4x. On the x86 cluster, LOCAT confirmed related outcomes to these on the ARM cluster.
Jinhan Xin et al, LOCAT: Low-Overhead On-line Configuration Auto-Tuning of Spark SQL Purposes, Proceedings of the 2022 Worldwide Convention on Administration of Information (2022). DOI: 10.1145/3514221.3526157
Chinese Academy of Sciences
Novel tuning technique for Spark SQL functions (2022, June 16)
retrieved 16 June 2022
This doc is topic to copyright. Other than any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for data functions solely.
When you’ve got any issues or complaints concerning this text, please tell us and the article can be eliminated quickly.
- Rakesh Jhunjhunwala: Rakesh Jhunjhunwala had a special connection with beer, bets fiercely during Harshad Mehta’s era
- Rakesh Jhunjhunwala was last in public at the inaugural flight of Akasa Air
- Independence Day 2022 Bollywood Dialogue | ‘Hamara Hindustan Zindabad Tha, Zindabad Hai Aur Zindabad Rahega’, these 12 powerful dialogues from Bollywood movies will fill you with enthusiasm.