Nonparametric Density and Regression Estimation for Samples of Very Large Size

  1. Barreiro Ures, Daniel
Supervised by:
  1. Ricardo Cao Abad Director
  2. Mario Francisco-Fernández Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 10 December 2021

Committee:
  1. María Dolores Martínez Miranda Chair
  2. Rubén Fernández Casal Secretary
  3. Philippe Vieu Committee member
Department:
  1. Mathematics

Type: Thesis

Teseo: 697973 DIALNET lock_openRUC editor

Abstract

This dissertation mainly deals with the problem of bandwidth selection in the context of nonparametric density and regression estimation for samples of very large size. Some bandwidth selection methods have the disadvantage of high computational complexity. This implies that the number of operations required to compute the bandwidth grows very rapidly as the sample size increases, so that the computational cost associated with these algorithms makes them unsuitable for samples of very large size. In the present thesis, this problem is addressed through the use of subagging, an ensemble method that combines bootstrap aggregating or bagging with the use of subsampling. The latter reduces the computational cost associated with the process of bandwidth selection, while the former is aimed at achieving signi cant reductions in the variability of the bandwidth selector. Thus, subagging versions are proposed for bandwidth selection methods based on widely known criteria such as cross-validation or bootstrap. When applying subagging to the cross-validation bandwidth selector, both for the Parzen{Rosenblatt estimator and the Nadaraya{ Watson estimator, the proposed selectors are studied and their asymptotic properties derived. The empirical behavior of all the proposed bandwidth selectors is shown through various simulation studies and applications to real datasets.