Comparison of Imputation Methods for Univariate Time Series
Main Article Content
Abstract
In order to predict and forecast with greater accuracy, handling “missing values” in “time series” information is crucial. Complete and accurate historical data are essential. There are many research studies on multivariate time series imputation, however due to the lack of associated factors, imputation in univariate time series data is rarely taken into consideration. It is natural that “missing values” could arise because almost all scientific disciplines that collect, store, and monitor data use "time series" observations. Therefore, time series characteristics must be considered in order to develop an effective and acceptable method for dealing with missing data. This work uses the statistical package R to assess and measure the effectiveness of imputation methods in the context of "univariate time series" data. The “imputation algorithms” explored are evaluated using “root mean square error”, “mean absolute error” and “mean absolute percent error”. Four types of “time series” are taken into consideration. According to experimental findings, “seasonal decomposition” performs better on the time series having seasonality characteristic, followed by “linear interpolation”, and “kalman smoothing” provides values that are more similar to the original time series data set and have lower error rates than other imputation techniques.
Article Details
References
Moritz, S., & Bartz-Beielstein, T. (2017). imputeTS: Time Series Missing Value Imputation in R. The R Journal, 9(1), 207. https://doi.org/10.32614/rj-2017-009
Moritz, S., Sardá, A., Bartz-Beielstein,T., Zaefferer, M. and Stork, J. (2015.). Comparison of different Methods for Univariate Time Series Imputation in R. https://doi.org/10.48550/arXiv.1510.03924.
Chaudhry, A., Li, W., Basri, A., & Patenaude, F. (2019, January 3). A Method for Improving Imputation and Prediction Accuracy of Highly Seasonal Univariate Data with Large Periods of Missingness. Wireless Communications and Mobile Computing, 1–13. https://doi.org/10.1155/2019/4039758.
Wongoutong, C., (2021, October). Imputation Methods in Time Series with a Trend and a Consecutive Missing Value Pattern. Thailand Statistician, 19(4), 866-879. http://statassoc.or.th.
Han, H., Sun, M., Han, H., Wu, X., & Qiao, J. (2022, April). Univariate imputation method for recovering missing data in wastewater treatment process. Chinese Journal of Chemical Engineering. https://doi.org/10.1016/j.cjche.2022.01.033
Rantou, K.,-E., Karagrigoriou, A., Vonta I. (2017).On imputation methods in univariate time series. Mathematics in Engineering, Science and Aerospace (MESA), 8(2).
Flores, A., Tito, H., & Silva, C. (2019). Local Average of Nearest Neighbors: Univariate Time Series Imputation. International Journal of Advanced Computer Science and Applications, 10(8). https://doi.org/10.14569/ijacsa.2019.0100807.
Baddoo, T. D., Li, Z., Odai, S. N., Boni, K. R. C., Nooni, I. K., & Andam-Akorful, S. A. (2021, August 7). Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation. International Journal of Environmental Research and Public Health, 18(16), 8375. https://doi.org/10.3390/ijerph18168375
Phan, T. T. H., Poisson Caillault, M., Lefebvre, A., & Bigand, A. (2020, November). Dynamic time warping-based imputation for univariate time series data. Pattern Recognition Letters, 139, 139–147. https://doi.org/10.1016/j.patrec.2017.08.019.
Twumasi-Ankrah, S.,Odoi, B., Pels, A., W. & Gyamfi, H.,E. (2019). Efficiency of Imputation Techniques in Univariate Time Series. International Journal of Science, Environment and Technology, 8(3), 430–453.
Hadeed, J.,S., O'Rourke, M., K., Burgess, L.J., Harris, R., B., Canales, A., R. (2020).Imputation methods for addressing missing data in short-term monitoring of airpollutants. Science of Total Environment, 730, https://doi.org/10.1016/j.scitotenv.2020. 139140.
Mahmoudvand, R., & Rodrigues, P. C. (2016, March). Missing value imputation in time series using Singular Spectrum Analysis. International Journal of Energy and Statistics, 04(01), 1650005. https://doi.org/10.1142/s2335680416500058.
Bokde, N., Ãlvarez, M., F., Beck, W. M., Kulat, K. (2018).A novel imputation methodology for time series based on pattern sequence forecasting, Pattern Recognition Letters, 116, 88-96, https://doi.org/10.1016/j.patrec.2018.09.020.
Jeong,H.,-Y.,Hong,S.,-H.,Jeon,J.,-S.,Lim,S.,-C.,Kim J.,-C., Park C.,-Y.(2021). A Research for Imputation Method of Photovoltaic Power Missing Data to Apply Time Series Models. Journal of Korea Multimedia Society, 24(9), 1251-1260, https://doi.org/10.9717/kmms.2021.24.9. 1251.
Phan,T.,-T.,-H. (2020). Machine Learning for Univariate Time Series Imputation, 2020 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 1-6, DOI: 10.1109/MAPR49794.2020.9237768.
Ran, B., Tan, H., Feng, J., Liu, Y., & Wang, W. (2015). Traffic Speed Data Imputation Method Based on Tensor Completion. Computational Intelligence and Neuroscience, 2015, 1–9, https://doi.org/10.1155/2015/364089.
Yang, Y.(2012). Modelling Nonlinear Vector Economic Time Series, Ph.D. dissertation, Aarhus University.
Hossie, T. J., Gobin, J., & Murray, D. L. (2021, August 19). Confronting Missing Ecological Data in the Age of Pandemic Lockdown. Frontiers in Ecology and Evolution, 9. https://doi.org/10.3389/fevo.2021.669477.
Mohamad, N. B., Lim, B. H., & Lai, A. C. (2021, April 1). Imputation of Missing Values for Solar Irradiance Data under Different Weathers using Univariate Methods. IOP Conference Series: Earth and Environmental Science, 721(1), 012004. https://doi.org/10.1088/1755-1315/721/1/012004
Stankovic, L., Stankovic, S., & Amin, M. (2014, January). Missing Samples Analysis in Signals for Applications to L-Estimation and Compressive Sensing. Signal Processing, 94(1), 401–408. https://doi.org/10.1016/j.sigpro.2013.07.002
Box, G. E. P., Jenkins, G. M., Reinsel, G. C. and Ljung, G. M. (2015). Time series analysis: forecasting and control. Fifth Edition. John Wiley and Sons.
Martina, F., Andreas, F., Oliver, F., Olaf, M., Thomas, B.-B., and Klaus, W. (2014, July 16). GECCO Industrial Challenge 2014 Dataset: A water quality dataset for the ’Active protection against pollution of the surface water’ competition at the Genetic and Evolutionary Computation Conference 2015, Vancouver, Canada. http://www.spotseven.de/geccochallenge/gecco-challenge-2014.
Moritz, S., Martina, F., Andreas, F., Christopher, S., and Thomas, B.-B., (2015, May 1). GECCO Industrial Challenge 2015 Dataset: A heating system dataset for the ’Recovering missing information in heating system operating data’ competition at the Geneticand Evolutionary Computation Conference 2015, Madrid, Spain. http://doi.org/10.5281/zenodo.3884899.