Kernel distribution estimation for grouped data

  1. Miguel Reyes 1
  2. Mario Francisco-Fernández 2
  3. Ricardo Cao 2
  4. Daniel Barreiro-Ures 2
  1. 1 Universidad de las Américas-Puebla, Puebla, México
  2. 2 Universidade da Coruña
    info

    Universidade da Coruña

    La Coruña, España

    ROR https://ror.org/01qckj285

Journal:
Sort: Statistics and Operations Research Transactions

ISSN: 1696-2281

Year of publication: 2019

Volume: 43

Issue: 2

Pages: 259-288

Type: Article

More publications in: Sort: Statistics and Operations Research Transactions

Abstract

Interval-grouped data appear when the observations are not obtained in continuous time, but monitored in periodical time instants. In this framework, a nonparametric kernel distribution estimator is proposed and studied. The asymptotic bias, variance and mean integrated squared error of the new approach are derived. From the asymptotic mean integrated squared error, a plug-in bandwidth is proposed. Additionally, a bootstrap selector to be used in this context is designed. Through a comprehensive simulation study, the behaviour of the estimator and the bandwidth selectors considering different scenarios of data grouping is shown. The performance of the different approaches is also illustrated with a real grouped emergence data set of Avena sterilis (wild oat).

Funding information

This research has been supported by MINECO grants MTM2014-52876-R and MTM2017-82724-R, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF.

Bibliographic References

  • Altman, N. and Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. Journal of Statistical Planning and Inference, 46, 195–214.
  • Anastasiou, K., Kechriniotis A. and Kotsos, B. (2006). Generalizations of the Ostrowski’s inequality. Journal of Interdisciplinary Mathematics, 9, 49–60.
  • Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika, 68, 326–328.
  • Barreiro, D., Fraguela, B., Doallo, R., Cao, R., Francisco-Fernandez, M. and Reyes, M. (2019). binnednp: Nonparametric estimation for interval-grouped data. https://cran.r-project.org/package=binnednp R package version 0.4.0.
  • Blower, G. and Kelsall, J. E. (2002). Nonlinear kernel density estimation for binned data: convergence in entropy. Bernoulli, 8, 423–449.
  • Bowman, A., Hall, P. and Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika, 85, 799–808.
  • Brown, L., Cai, T., Zhang, R., Zhao, L. and Zhou, H. (2010). The root-unroot algorithm for density estimation as implemented via waved block thresholding. Probability Theory and Related Fields, 146, 401–433.
  • Cao, R., Francisco-Fernández, M., Anand, A., Bastida, F. and González-Andújar, J. L. (2011). Computing statistical indices for hydrothermal times using weed emergence data. Journal of Agricultural Science, 149, 701–712.
  • Cao, R., Francisco-Fernández, M., Anand, A., Bastida, F. and González-Andújar, J. L. (2013). Modeling Bromus diandrus seedling emergence using nonparametric estimation. Journal of Agricultural, Biological, and Environmental Statistics, 18, 64–86.
  • Coit, D. and Dey, K. (1999). Analysis of grouped data from field-failure reporting systems. Reliability Engineering & System Safety, 65, 95–101.
  • Dutta, S. (2015). Local smoothing for kernel distribution function estimation. Communications in Statistics, Simulation and Computation, 44, 878–891.
  • González-Andújar, J. L., Francisco-Fernández, M., Cao, R., Reyes, M., Urbano, J. M., Forcella, F. and Bastida, F. (2016). A comparative study between nonlinear regression and nonparametric approaches for modeling Phalaris paradoxa seedling emergence. Weed Research, 56, 367–376.
  • Guo, S. (2005). Analysing grouped data with hierarchical linear modeling. Children and Youth Services Review, 27, 637–652.
  • Hill, P. (1985). Kernel estimation of a distribution function. Communications in Statistics, Theory and Methods, 14, 605–620.
  • Klein, J. P. and Moeschberger, M. (1997). Survival Analysis. New York: Springer Verlag.
  • Mächler, M. (2017). nor1mix: Normal (1-d) Mixture Models (S3 Classes and Methods). https://CRAN. R-project.org/package=nor1mix. R package version 1.2-3.
  • Mack, Y. (1984). Remarks on some smoothed empirical distribution functions and processes. Bulletin of Informatics and Cybernetics, 21, 29–35.
  • Mclachlan, G. and Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons, Inc.
  • Minoiu, C. and Reddy, S. (2009). Estimating poverty and inequality from grouped data: How well do parametric methods perform? Journal of Income Distribution, 18, 160–178.
  • Nadaraya, E. (1964). On estimating regression. Theory of Probability and Applications, 10, 186–190.
  • Ostrowski, A. (1938). Über die Absolutabweichung einer differenzierbaren Funktion von ihrem Integralmittelwert. Commentarii Mathematici Helvetici, 10, 226–227.
  • Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065–1076.
  • Pipper, C. and Ritz, C. (2007). Checking the grouped data version of the Cox model for interval-grouped data survival data. Scandinavian Journal of Statistics, 34, 405–418.
  • Polanski, A. and Baker, E. (2000). Multistage plug-in bandwidth selection for kernel distribution function estimates. Journal of Statistical Computation and Simulation, 65, 63–80.
  • Quintela-del-Rı́o, A. and Estévez-Pérez, G. (2012). Nonparametric kernel distribution function estimation with kerdiest: An R package for bandwidth choice and applications. Journal of Statistical Software, 50, 1–21. http://www.jstatsoft.org/v50/i08/.
  • R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
  • Reiss, R. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119.
  • Reyes, M., Francisco-Fernandez, M. and Cao, R. (2016). Nonparametric kernel density estimation for general grouped data. Journal of Nonparametric Statistics, 28, 235–249.
  • Reyes, M., Francisco-Fernández, M. and Cao, R. (2017). Bandwidth selection in kernel density estimation for interval-grouped data. TEST, 26, 527–545.
  • Rizzi, S., Thinggaard, M., Engholm, G., Christensen, N., Johannesen, T., Vaupel, J. and Jacobsen, R. (2016). Comparison of non-parametric methods for ungrouping coarsely aggregated data. BMC Medical Research Methodology, 16, 59.
  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27, 832–837.
  • Sarda, P. (1993). Smoothing parameter selection for smooth distribution function. Journal of Statistical Planning and Inference, 35, 65–75.
  • Scott, D. and Sheather, S. (1985). Kernel density estimation with binned data. Communications in Statistics, Theory and Methods, 27, 832–837.
  • Titterington, D. (1983). Kernel-based density estimation using censored, truncated or grouped data. Communications in Statistics, Theory and Methods, 12, 2151–2167.
  • Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological), 38, 290–295.
  • Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall/CRC.
  • Wang, B. and Wang, X.-F. (2016). Fitting the generalized lambda distribution to pre-binned data. Journal of Statistical Computation and Simulation, 86, 1785–1797.
  • Wang, B. and Wertelecki, W. (2013). Density estimation for data with rounding errors. Computational Statistics & Data Analysis, 65, 4–12.