Towards more reliable feature evaluations for classification
- Prat MasRamón, Gabriel
- Lluis Antoni Belanche Muñoz Director/a
Universidad de defensa: Universitat Politècnica de Catalunya (UPC)
Fecha de defensa: 01 de febrero de 2016
- Amparo Alonso Betanzos Presidenta
- Alfredo Vellido Alacena Secretario/a
- Raquel Iniesta Benedicto Vocal
Tipo: Tesis
Resumen
In this thesis we study feature subset selection and feature weighting algorithms. Our aim is to make their output more stable and more useful when used to train a classifier. We begin by defining the concept of stability and selecting a measure to asses the output of the feature selection process. Then we study different sources of instability and propose modifications of classic algorithms that improve their stability. We propose a modification of wrapper algorithms that take otherwise unused information into account to overcome an intrinsic source of instability for this algorithms: the feature assessment being a random variable that depends on the particular training subsample. Our version accumulates the evaluation results of each feature at each iteration to average out the effect of the randomness. Another novel proposal is to make wrappers evaluate the remainder set of features at each step to overcome another source of instability: randomness of the algorithms themselves. In this case, by evaluating the non-selected set of features, the initial choice of variables is more educated. These modifications do not bring a great amount of computational overhead and deliver better results, both in terms of stability and predictive power. We finally tackle another source of instability: the differential contribution of the instances to feature assessment. We present a framework to combine almost any instance weighting algorithm with any feature weighting one. Our combination of algorithms deliver more stable results for the various feature weighting algorithms we have tested. Finally, we present a deeper integration of instance weighting with feature weighting by modifying the Simba algorithm, that delivers even better results in terms of stability