Machine learning techniques for android malware detection and classification

  1. Martín García, Alejandro
Dirigida por:
  1. David Camacho Fernández Director/a
  2. Raúl Lara Cabrera Director/a

Universidad de defensa: Universidad Autónoma de Madrid

Fecha de defensa: 15 de marzo de 2019

Tribunal:
  1. Sancho Salcedo Sanz Presidente/a
  2. Antonio González Pardo Secretario/a
  3. Amparo Alonso Betanzos Vocal
  4. Julio César Hernández Castro Vocal
  5. Constantinos Patsakis Vocal

Tipo: Tesis

Resumen

Android has been intently picked as the main target by many malware creators for designing new malicious applications. Every day, thousands of new malware samples try to circumvent the security measures implemented by Android applications stores, aiming to infect new devices. In order to tackle this problem, it is required to research and develop mechanisms able to classify large amounts of suspicious samples automatically, detecting those that contain a malicious payload. This thesis studies and addresses the application of machine learning techniques for the construction of Android malware detection mechanisms taking into account different perspectives. Furthermore, the classification of Android malware into families is also addressed. A preliminary in-depth study of the Jisut family of Android malware has allowed to reveal some of the most important practices employed and which must be considered when facing these two tasks. In the first place, machine learning techniques are applied as the core element to build Android malware detection methods aimed at deciding accurately whether an application is malware or benignware. For that purpose, the behaviour of each application is described through groups of static and dynamic features, which are modelled using a Markov chains based representation. Then, ensemble classifiers are applied, showing how static features provide better results in comparison to dynamically extracted features. A fusion approach of both categories of features is also proposed, showing improved performance in comparison to models relying on a particular set of features. In the second place, the classification of Android malicious applications into malware families is also tackled in this dissertation, an essential task which seeks to minimise the damages caused and to properly identify groups of malware. Deep learning architectures, classic machine learning algorithms, and different techniques for dealing with imbalanced data are tested in this case. The results evidence that these techniques allow to develop accurate family classification methods. The resilience of these methods against adversarial attacks is also analysed. A targeted attack against a state-of-the-art classifier is proposed, showing that it is possible to force the classifier to allocate samples to a fictitious, random, and new malware family or even to a previously selected destination family. Finally, an open source framework called AndroPyTool is presented. It integrates different state-of-the-art malware analysis tools with the main goal of providing the research community with an integrated tool for the extraction of a wide set of static and dynamic features. Using this tool, the OmniDroid dataset is built and publicly released, containing both static and dynamic features extracted from benign and malicious Android applications.