Kernel distribution estimation for grouped data

Miguel Reyes; Mario Francisco-Fernández; Ricardo Cao; Daniel Barreiro-Ures

Kernel distribution estimation for grouped data

Miguel Reyes ¹
Mario Francisco-Fernández ²
Ricardo Cao ²
Daniel Barreiro-Ures ²

1 Universidad de las Américas-Puebla, Puebla, México
2 Universidade da Coruña

Universidade da Coruña

La Coruña, España

ROR https://ror.org/01qckj285

Journal:

Sort: Statistics and Operations Research Transactions

ISSN: 1696-2281

Year of publication: 2019

Volume: 43

Issue: 2

Pages: 259-288

Type: Article

DIALNET GOOGLE SCHOLAR Open access editor

More publications in: Sort: Statistics and Operations Research Transactions

Abstract

Interval-grouped data appear when the observations are not obtained in continuous time, but monitored in periodical time instants. In this framework, a nonparametric kernel distribution estimator is proposed and studied. The asymptotic bias, variance and mean integrated squared error of the new approach are derived. From the asymptotic mean integrated squared error, a plug-in bandwidth is proposed. Additionally, a bootstrap selector to be used in this context is designed. Through a comprehensive simulation study, the behaviour of the estimator and the bandwidth selectors considering different scenarios of data grouping is shown. The performance of the different approaches is also illustrated with a real grouped emergence data set of Avena sterilis (wild oat).

€ View funding

Funding information

This research has been supported by MINECO grants MTM2014-52876-R and MTM2017-82724-R, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF.

Funders

Ministerio de Economía y Competitividad Spain
- MTM2017-82724-R
European Regional Development Fund European Union
Xunta de Galicia Spain
- ED431C-2016-015
Centro Singular de Investigación de Galicia Spain
- ED431G/01

Bibliographic References

Altman, N. and Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. Journal of Statistical Planning and Inference, 46, 195–214.
Anastasiou, K., Kechriniotis A. and Kotsos, B. (2006). Generalizations of the Ostrowski’s inequality. Journal of Interdisciplinary Mathematics, 9, 49–60.
Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika, 68, 326–328.
Barreiro, D., Fraguela, B., Doallo, R., Cao, R., Francisco-Fernandez, M. and Reyes, M. (2019). binnednp: Nonparametric estimation for interval-grouped data. https://cran.r-project.org/package=binnednp R package version 0.4.0.
Blower, G. and Kelsall, J. E. (2002). Nonlinear kernel density estimation for binned data: convergence in entropy. Bernoulli, 8, 423–449.
Bowman, A., Hall, P. and Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika, 85, 799–808.
Brown, L., Cai, T., Zhang, R., Zhao, L. and Zhou, H. (2010). The root-unroot algorithm for density estimation as implemented via waved block thresholding. Probability Theory and Related Fields, 146, 401–433.
Cao, R., Francisco-FernaÌndez, M., Anand, A., Bastida, F. and GonzaÌlez-AnduÌjar, J. L. (2011). Computing statistical indices for hydrothermal times using weed emergence data. Journal of Agricultural Science, 149, 701–712.
Cao, R., Francisco-FernaÌndez, M., Anand, A., Bastida, F. and GonzaÌlez-AnduÌjar, J. L. (2013). Modeling Bromus diandrus seedling emergence using nonparametric estimation. Journal of Agricultural, Biological, and Environmental Statistics, 18, 64–86.
Coit, D. and Dey, K. (1999). Analysis of grouped data from field-failure reporting systems. Reliability Engineering & System Safety, 65, 95–101.
Dutta, S. (2015). Local smoothing for kernel distribution function estimation. Communications in Statistics, Simulation and Computation, 44, 878–891.
GonzaÌlez-AnduÌjar, J. L., Francisco-FernaÌndez, M., Cao, R., Reyes, M., Urbano, J. M., Forcella, F. and Bastida, F. (2016). A comparative study between nonlinear regression and nonparametric approaches for modeling Phalaris paradoxa seedling emergence. Weed Research, 56, 367–376.
Guo, S. (2005). Analysing grouped data with hierarchical linear modeling. Children and Youth Services Review, 27, 637–652.
Hill, P. (1985). Kernel estimation of a distribution function. Communications in Statistics, Theory and Methods, 14, 605–620.
Klein, J. P. and Moeschberger, M. (1997). Survival Analysis. New York: Springer Verlag.
MaÌchler, M. (2017). nor1mix: Normal (1-d) Mixture Models (S3 Classes and Methods). https://CRAN. R-project.org/package=nor1mix. R package version 1.2-3.
Mack, Y. (1984). Remarks on some smoothed empirical distribution functions and processes. Bulletin of Informatics and Cybernetics, 21, 29–35.
Mclachlan, G. and Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons, Inc.
Minoiu, C. and Reddy, S. (2009). Estimating poverty and inequality from grouped data: How well do parametric methods perform? Journal of Income Distribution, 18, 160–178.
Nadaraya, E. (1964). On estimating regression. Theory of Probability and Applications, 10, 186–190.
Ostrowski, A. (1938). UÌber die Absolutabweichung einer differenzierbaren Funktion von ihrem Integralmittelwert. Commentarii Mathematici Helvetici, 10, 226–227.
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065–1076.
Pipper, C. and Ritz, C. (2007). Checking the grouped data version of the Cox model for interval-grouped data survival data. Scandinavian Journal of Statistics, 34, 405–418.
Polanski, A. and Baker, E. (2000). Multistage plug-in bandwidth selection for kernel distribution function estimates. Journal of Statistical Computation and Simulation, 65, 63–80.
Quintela-del-RÄ±Ìo, A. and EsteÌvez-PeÌrez, G. (2012). Nonparametric kernel distribution function estimation with kerdiest: An R package for bandwidth choice and applications. Journal of Statistical Software, 50, 1–21. http://www.jstatsoft.org/v50/i08/.
R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Reiss, R. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119.
Reyes, M., Francisco-Fernandez, M. and Cao, R. (2016). Nonparametric kernel density estimation for general grouped data. Journal of Nonparametric Statistics, 28, 235–249.
Reyes, M., Francisco-FernaÌndez, M. and Cao, R. (2017). Bandwidth selection in kernel density estimation for interval-grouped data. TEST, 26, 527–545.
Rizzi, S., Thinggaard, M., Engholm, G., Christensen, N., Johannesen, T., Vaupel, J. and Jacobsen, R. (2016). Comparison of non-parametric methods for ungrouping coarsely aggregated data. BMC Medical Research Methodology, 16, 59.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27, 832–837.
Sarda, P. (1993). Smoothing parameter selection for smooth distribution function. Journal of Statistical Planning and Inference, 35, 65–75.
Scott, D. and Sheather, S. (1985). Kernel density estimation with binned data. Communications in Statistics, Theory and Methods, 27, 832–837.
Titterington, D. (1983). Kernel-based density estimation using censored, truncated or grouped data. Communications in Statistics, Theory and Methods, 12, 2151–2167.
Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological), 38, 290–295.
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall/CRC.
Wang, B. and Wang, X.-F. (2016). Fitting the generalized lambda distribution to pre-binned data. Journal of Statistical Computation and Simulation, 86, 1785–1797.
Wang, B. and Wertelecki, W. (2013). Density estimation for data with rounding errors. Computational Statistics & Data Analysis, 65, 4–12.

Data source: Dialnet

Kernel distribution estimation for grouped data

Universidade da Coruña

Abstract

Funding information

Funders

Bibliographic References