The Mahalanobis distance for functional data with applications in statistical problems
- Joseph, Esdras
- Rosa Elvira Lillo Rodríguez Director/a
- Pedro Galeano Director/a
Universidad de defensa: Universidad Carlos III de Madrid
Fecha de defensa: 16 de junio de 2015
- Francisco Javier Prieto Fernández Presidente/a
- Ana María Justel Eusebio Secretario/a
- José Vilar Vocal
Tipo: Tesis
Resumen
Functional data refer to data which consist of curves evaluated at a finite subset of some interval in the real line. In this thesis, we deal with this type of data, focusing on the notion of functional distance. In the literature, there is few references to the role played by distances between functional data. Recently, Ferraty and Vieu [20] have proposed some semi-metrics well adapted for sample functions. However, common distances frequently used for multivariate data analysis such as the Mahalanobis distance proposed by Mahalanobis [39], have not been extended to the functional framework. This issue motivated this thesis and its main contribution is to enlarge the number of available functional distances by introducing a new semi-distance that generalizes the usual Mahalanobis distance. The use of functional distances is important in many different problems, including supervised classification and hypothesis testing. Then the other contributions in this dissertation is to propose new procedures based on the combination of those methods with the functional Mahalanobis semi-distance as in the multivariate context. The thesis is organized as follows. In Chapter 1 we review the formal definition of functional data as well as the notion of functional principal components which is an important tool for some of the concepts that will be seen in this dissertation. We also offer a brief historical summary of distances in the multivariate context and how the concept of distance has been extended to FDA. Finally, we recall some functional methods for which the notion of distance can be very useful, e.g., supervised and unsupervised classification, hypothesis testing, prediction and the concept of density function for functional data. In Chapter 2, we present a new semi-distance for functional observations that generalizes the Mahalanobis distance for multivariate datasets to the functional framework. We also shown the main characteristics of the functional Mahalanobis semi-distance. In order to illustrate the applicability of this measure of proximity between functional observations, we develop new versions of several well known functional classification procedures using the functional Mahalanobis semi-distance. We illustrate the performance of the new semi-distance with simulated and two real data examples indicating that the classification methods used in conjunction with the functional Mahalanobis semi-distance give better results than other well-known functional classification procedures. In Chapter 3, we derive two-sample Hotelling's T ² statistics for testing the equality of means in two samples independently drawn from two functional distributions. The statistics that we propose are based on the functional Mahalanobis semi-distance and, under certain conditions, their asymptotic distributions are chi-squared, regardless the distribution of the functional random samples. We provide the link between the two sample Hotelling's T ² statistics based on the functional Mahalanobis semi-distance and statistics based on the functional principal components semi-distance. The behavior of all these statistics is analyzed by means of an extensive Monte Carlo study and the analysis of a real data set collected in climatology. The results appear to indicate that the two-sample Hotelling's T ² statistics outperform in terms of power those based on the functional principal components semi-distance. Finally, Chapter 4 is dedicated to some summary and some possible future research lines of the work presented in this thesis.