Kernel distribution estimation for grouped data
- Miguel Reyes 1
- Mario Francisco-Fernández 2
- Ricardo Cao 2
- Daniel Barreiro-Ures 2
- 1 Universidad de las Américas-Puebla, Puebla, México
-
2
Universidade da Coruña
info
ISSN: 1696-2281
Year of publication: 2019
Volume: 43
Issue: 2
Pages: 259-288
Type: Article
More publications in: Sort: Statistics and Operations Research Transactions
Abstract
Interval-grouped data appear when the observations are not obtained in continuous time, but monitored in periodical time instants. In this framework, a nonparametric kernel distribution estimator is proposed and studied. The asymptotic bias, variance and mean integrated squared error of the new approach are derived. From the asymptotic mean integrated squared error, a plug-in bandwidth is proposed. Additionally, a bootstrap selector to be used in this context is designed. Through a comprehensive simulation study, the behaviour of the estimator and the bandwidth selectors considering different scenarios of data grouping is shown. The performance of the different approaches is also illustrated with a real grouped emergence data set of Avena sterilis (wild oat).
Funding information
This research has been supported by MINECO grants MTM2014-52876-R and MTM2017-82724-R, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF.Funders
-
Ministerio de Economía y Competitividad
Spain
- MTM2017-82724-R
- European Regional Development Fund European Union
-
Xunta de Galicia
Spain
- ED431C-2016-015
-
Centro Singular de Investigación de Galicia
Spain
- ED431G/01
Bibliographic References
- Altman, N. and Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. Journal of Statistical Planning and Inference, 46, 195–214.
- Anastasiou, K., Kechriniotis A. and Kotsos, B. (2006). Generalizations of the Ostrowski’s inequality. Journal of Interdisciplinary Mathematics, 9, 49–60.
- Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika, 68, 326–328.
- Barreiro, D., Fraguela, B., Doallo, R., Cao, R., Francisco-Fernandez, M. and Reyes, M. (2019). binnednp: Nonparametric estimation for interval-grouped data. https://cran.r-project.org/package=binnednp R package version 0.4.0.
- Blower, G. and Kelsall, J. E. (2002). Nonlinear kernel density estimation for binned data: convergence in entropy. Bernoulli, 8, 423–449.
- Bowman, A., Hall, P. and Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika, 85, 799–808.
- Brown, L., Cai, T., Zhang, R., Zhao, L. and Zhou, H. (2010). The root-unroot algorithm for density estimation as implemented via waved block thresholding. Probability Theory and Related Fields, 146, 401–433.
- Cao, R., Francisco-FernaÌndez, M., Anand, A., Bastida, F. and GonzaÌlez-AnduÌjar, J. L. (2011). Computing statistical indices for hydrothermal times using weed emergence data. Journal of Agricultural Science, 149, 701–712.
- Cao, R., Francisco-FernaÌndez, M., Anand, A., Bastida, F. and GonzaÌlez-AnduÌjar, J. L. (2013). Modeling Bromus diandrus seedling emergence using nonparametric estimation. Journal of Agricultural, Biological, and Environmental Statistics, 18, 64–86.
- Coit, D. and Dey, K. (1999). Analysis of grouped data from field-failure reporting systems. Reliability Engineering & System Safety, 65, 95–101.
- Dutta, S. (2015). Local smoothing for kernel distribution function estimation. Communications in Statistics, Simulation and Computation, 44, 878–891.
- GonzaÌlez-AnduÌjar, J. L., Francisco-FernaÌndez, M., Cao, R., Reyes, M., Urbano, J. M., Forcella, F. and Bastida, F. (2016). A comparative study between nonlinear regression and nonparametric approaches for modeling Phalaris paradoxa seedling emergence. Weed Research, 56, 367–376.
- Guo, S. (2005). Analysing grouped data with hierarchical linear modeling. Children and Youth Services Review, 27, 637–652.
- Hill, P. (1985). Kernel estimation of a distribution function. Communications in Statistics, Theory and Methods, 14, 605–620.
- Klein, J. P. and Moeschberger, M. (1997). Survival Analysis. New York: Springer Verlag.
- MaÌchler, M. (2017). nor1mix: Normal (1-d) Mixture Models (S3 Classes and Methods). https://CRAN. R-project.org/package=nor1mix. R package version 1.2-3.
- Mack, Y. (1984). Remarks on some smoothed empirical distribution functions and processes. Bulletin of Informatics and Cybernetics, 21, 29–35.
- Mclachlan, G. and Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons, Inc.
- Minoiu, C. and Reddy, S. (2009). Estimating poverty and inequality from grouped data: How well do parametric methods perform? Journal of Income Distribution, 18, 160–178.
- Nadaraya, E. (1964). On estimating regression. Theory of Probability and Applications, 10, 186–190.
- Ostrowski, A. (1938). UÌber die Absolutabweichung einer differenzierbaren Funktion von ihrem Integralmittelwert. Commentarii Mathematici Helvetici, 10, 226–227.
- Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065–1076.
- Pipper, C. and Ritz, C. (2007). Checking the grouped data version of the Cox model for interval-grouped data survival data. Scandinavian Journal of Statistics, 34, 405–418.
- Polanski, A. and Baker, E. (2000). Multistage plug-in bandwidth selection for kernel distribution function estimates. Journal of Statistical Computation and Simulation, 65, 63–80.
- Quintela-del-RıÌo, A. and EsteÌvez-PeÌrez, G. (2012). Nonparametric kernel distribution function estimation with kerdiest: An R package for bandwidth choice and applications. Journal of Statistical Software, 50, 1–21. http://www.jstatsoft.org/v50/i08/.
- R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
- Reiss, R. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119.
- Reyes, M., Francisco-Fernandez, M. and Cao, R. (2016). Nonparametric kernel density estimation for general grouped data. Journal of Nonparametric Statistics, 28, 235–249.
- Reyes, M., Francisco-FernaÌndez, M. and Cao, R. (2017). Bandwidth selection in kernel density estimation for interval-grouped data. TEST, 26, 527–545.
- Rizzi, S., Thinggaard, M., Engholm, G., Christensen, N., Johannesen, T., Vaupel, J. and Jacobsen, R. (2016). Comparison of non-parametric methods for ungrouping coarsely aggregated data. BMC Medical Research Methodology, 16, 59.
- Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27, 832–837.
- Sarda, P. (1993). Smoothing parameter selection for smooth distribution function. Journal of Statistical Planning and Inference, 35, 65–75.
- Scott, D. and Sheather, S. (1985). Kernel density estimation with binned data. Communications in Statistics, Theory and Methods, 27, 832–837.
- Titterington, D. (1983). Kernel-based density estimation using censored, truncated or grouped data. Communications in Statistics, Theory and Methods, 12, 2151–2167.
- Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological), 38, 290–295.
- Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall/CRC.
- Wang, B. and Wang, X.-F. (2016). Fitting the generalized lambda distribution to pre-binned data. Journal of Statistical Computation and Simulation, 86, 1785–1797.
- Wang, B. and Wertelecki, W. (2013). Density estimation for data with rounding errors. Computational Statistics & Data Analysis, 65, 4–12.