Automatic rescaling and tuning of big data applications on container-based virtual environments

Enes, Jonatan

Automatic rescaling and tuning of big data applications on container-based virtual environments

Enes, Jonatan

Supervised by:

Roberto R. Expósito Co-director
Juan Touriño Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 14 October 2020

Committee:

Jesús Carretero Pérez Chair
Guillermo L. Taboada Secretary
Sabela Ramos Garea Committee member

Department:

Computer Engineering

Type: Thesis

Teseo: 638382 DIALNET RUC editor

Abstract

Current Big Data applications have significantly evolved from its origins, moving from mostly batch workloads to more complex ones that may involve many processing stages using different technologies or even working in real time. Moreover, to deploy these applications, commodity clusters have been in some cases replaced in favor of newer and more flexible paradigms such as the Cloud or even emerging ones such as serverless computing, usually involving virtualization techniques. This Thesis proposes two frameworks that provide alternative ways to perform indepth analysis and improved resource management for Big Data applications deployed on virtual environments based on software containers. On the one hand, the BDWatchdog framework is capable of performing real-time, fine-grain analysis in terms of system resource monitoring and code profiling. On the other hand, a framework for the dynamic and real-time scaling of resources according to several tuning policies is described. The first proposed policy revolves around the automatic scaling of the containers’ resources according to the real usage of the applications, thus providing a serverless environment. Furthermore, an alternative policy focused on energy management is presented in a scenario where power capping and budgeting functionalities are implemented for containers, applications or even users. Overall, the frameworks proposed in this Thesis aim to showcase how novel ways of analyzing and tuning the resources given to Big Data applications in container clusters are possible, even in real time. The supported use cases that were presented are examples of this, and show how Big Data applications can be adapted to newer technologies or paradigms without having to lose their distinctive characteristics.