Abstract

This study aims to model the Human Development Index (HDI) in Indonesia based on education quality indicators using multivariable kernel regression, and to identify the optimal bandwidth selection method through Cross-Validation (CV) and Generalized Cross-Validation (GCV). Employing a quantitative modeling design, the research utilizes secondary data comprising educational and HDI indicators from 34 Indonesian provinces in 2023. The analysis applies multivariable kernel regression with a triangle kernel function, with mean years of schooling and expected years of schooling as predictor variables. Bandwidth optimization is performed using CV and GCV, and model performance is assessed through the coefficient of determination (R²) and Mean Absolute Percentage Error (MAPE). The results indicate that the GCV method yields a slightly better model, with R² of 86.32% and MAPE of 1.94%, compared to the CV method, which as an R² of 85.92% and MAPE of 1.96%. While both models show excellent forecasting accuracy, GCV demonstrates superior stability and predictive performance. These findings confirm that multivariable kernel regression, particularly when optimized with GCV, is an effective approach for modeling complex data patterns such as HDI based on educational indicators in Indonesia.

Keywords: Cross-Validation, Education, Generalized Cross-Validation, Human Development Index, Kernel Regression, Nonparametric Regression

Downloads

Download data is not yet available.

References

  1. [1] Abdy, M. (2019). Tinjauan Singkat Tentang Regresi Parametrik dan Regresi non Parametrik. Saintifik, 5(1), 58–62. https://doi.org/10.31605/saintifik.v5i1.199
  2. [2] Lamusu, F., Machmud, T., & Resmawan, R. (2020). Estimator Nadaraya-Watson dengan Pendekatan Cross Validation dan Generalized Cross Validation untuk Mengestimasi Produksi Jagung. Indonesian Journal of Applied Statistics, 3(2), 93. https://doi.org/10.13057/ijas.v3i2.42125
  3. [3] Ogden, R. T. (1997). Essential Wavelets for Statistical Applications and Data Analysis (Vol. 16, Issue 1).
  4. [4] Hardle, W. (1994). Applied Nonparametric Regression. In Cambridge University Press (Vol. 156, Issue 1). https://doi.org/10.2307/2982873
  5. [5] Suparti, S., Santoso, R., Prahutama, A., & Devi, A. R. (2018). Regresi Nonparametrik. Wade Group.
  6. [6] Sadek, A. M., & Mohammed, L. A. (2024). Evaluation of the Performance of Kernel Non-parametric Regression and Ordinary Least Squares Regression. International Journal on Informatics Visualization, 8(3), 1352–1360. https://doi.org/10.62527/joiv.8.3.2430
  7. [7] Puspitasari, I., Suparti, & Wilandari, Y. (2012). Analisis Indeks Harga Saham Gabungan (IHSG) dengan Menggunakan Model Regresi Kernel. Jurnal Gaussian, 1(1), 93–102.
  8. [8] Astuti, D. A. D., Srinadi, I. G. A. M., & Susilawati, M. (2018). Pendekatan Regresi Nonparametrik Dengan Menggunakan Estimator Kernel Pada Data Kurs Rupiah Terhadap Dolar Amerika Serikat. E-Jurnal Matematika, 7(4), 305. https://doi.org/10.24843/mtk.2018.v07.i04.p218
  9. [9] Razak, R. A., Nur, I. M., & Arum, P. R. (2019). Penerapan Cross Validation (CV) dalam Pemilihan Bandwidth Optimal pada Pemodelan Regresi Nonparametrik Kernel (Studi Kasus: Gizi Buruk pada Balita Di Indonesia). Prosiding Mahasiswa Seminar Nasional Unimus, 2, 364–372.
  10. [10] Raghuvanshi, G., & Verma, D. P. (2024). Human Development Index: A Critical Review. International Journal for Multidisciplinary Research, 6(2), 1–15. https://doi.org/10.36948/ijfmr.2024.v06i02.18146
  11. [11] Badan Pusat Statistik. (2024). Indeks Pembangunan Manusia 2023 (Vol. 18). Badan Pusat Statistik.
  12. [12] Eubank, R. (1999). Nonparametric Regression and Spline Smoothing (Vol. 96, Issue 338).
  13. [13] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. In Statistics and Applied Probability (Vol. 60, Issue 3). https://doi.org/10.3311/PPme.8017
  14. [14] Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences, 42(1), 43–47. https://doi.org/10.1073/pnas.42.1.43
  15. [15] Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33(3), 1065–1076. https://doi.org/10.1214/aoms/1177704472
  16. [16] Machkouri, M. El. (2011). Asymptotic normality of the Parzen-Rosenblatt density estimator for strongly mixing random fields. Statistical Inference for Stochastic Processes, 14(1), 73–84. https://doi.org/10.1007/s11203-011-9052-4
  17. [17] García, E. (2023). Notes for Predictive Modeling. bookdown.org/egarpor/pm-uc3m.
  18. [18] Takezawa, K. (2006). Introduction To Nonparametric Regression. John Wiley & Sons, Inc.
  19. [19] Carmack, P. S., Spence, J. S., & Schucany, W. R. (2011). Generalized Correlated Cross-Validation (GCCV). In Generalized Correlated Cross-Validation (pp. 1–28).
  20. [20] Gujarati, D. N. (1972). Basic Econometrics. In The Economic Journal (Vol. 82, Issue 326). Gary Burke. https://doi.org/10.2307/2230043
  21. [21] Chin, W. W. (1998). The Partial Least Squares Approach to Structural Equation Modeling. In G. A. Marcoulides (Ed.), Modern Methods for Business Research (Issue January 1998, pp. 295-336.).
  22. [22] Maricar, M. A. (2019). Analisa Perbandingan Nilai Akurasi Moving Average dan Exponential Smoothing untuk Sistem Peramalan Pendapatan pada Perusahaan XYZ. Jurnal Sistem Dan Informatika, 13(2), 36-45.
  23. [23] Chang, P.-C., Wang, Y.-W., & Liu, C.-H. (2007). The development of a weighted evolving fuzzy neural network for PCB sales forecasting. ScienceDirect, 32(1), 86–96. https://doi.org/10.1016/j.eswa.2005.11.021

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.

 How to Cite
[1]
Rasyid, M.R. et al. 2025. Optimizing Education-Based HDI Modeling in Indonesia: A Multivariable Kernel Regression Approach with CV and GCV. International Journal of Science and Engineering Invention. 11, 04 (Jun. 2025), 52–61. DOI:https://doi.org/10.23958/ijsei/vol11-i04/284.

Copyrights & License