Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting

Qureshi, Moiz; Iftikhar, Hasnain; Rodrigues, Paulo Canas; Rehman, Mohd Ziaur; Salar, S. A. Atif

doi:10.3390/math12233666

Open AccessArticle

Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting

by

Moiz Qureshi

^1,2

,

Hasnain Iftikhar

^2,*

,

Paulo Canas Rodrigues

³

,

Mohd Ziaur Rehman

⁴

and

S. A. Atif Salar

⁵

¹

Government Degree College, Tando Jam, Hyderabad 70060, Pakistan

²

Department of Statistics, Quaid-i-Azam University, Islamabad 45320, Pakistan

³

Department of Statistics, Federal University of Bahia, Salvador 40170-110, Brazil

⁴

Department of Finance, College of Business Administration, King Saud University, P.O. Box 71115, Riyadh 11587, Saudi Arabia

⁵

Al-Barkaat Institute of Management Studies, Aligarh 202122, Dr. A. P. J. Abdul Kalam Technical University, Lucknow 226010, India

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(23), 3666; https://doi.org/10.3390/math12233666

Submission received: 19 October 2024 / Revised: 14 November 2024 / Accepted: 20 November 2024 / Published: 22 November 2024

(This article belongs to the Special Issue Time Series Forecasting for Economic and Financial Phenomena)

Download

Browse Figures

Versions Notes

Abstract

:

Bitcoin (BTC-USD) is a virtual currency that has grown in popularity after its inception in 2008. BTC-USD is an internet communication network that makes using digital money, including digital payments, easy. It offers decentralized clearing of transactions and money supply. This study attempts to accurately anticipate the BTC-USD prices (Close) using data from September 2023 to September 2024, comprising 390 observations. Four machine learning models—Multi-layer Perceptron, Extreme Learning Machine, Neural Network AutoRegression, and Extreme-Gradient Boost—as well as four time series models—Auto-Regressive Integrated Moving Average, Auto-Regressive, Non-Parametric Auto-Regressive, and Simple Exponential Smoothing models—are used to achieve this end. Various hybrid models are then proposed utilizing these models, which are based on simple averaging of these models. The data-splitting technique, commonly used in comparative analysis, splits the data into training and testing data sets. Through comparison testing with training data sets consisting of 30%, 20%, and 10%, the present work demonstrated that the suggested hybrid model outperforms the individual approaches in terms of error metrics, such as the MAE, RMSE, MAPE, SMAPE, and direction accuracy, such as correlation and the MDA of BTC. Furthermore, the DM test is utilized in this study to measure the differences in model performance, and a graphical evaluation of the models is also provided. The practical implication of this study is that financial analysts have a tool (the proposed model) that can yield insightful information about potential investments.

Keywords:

bitcoin prices forecasting; time series analysis; investment; time series; machine learning; hybrid models; decision making

MSC:

62M10; 68T07; 68T09; 03H10; 37N40; 62P20; 91G15; 91G30; 91B84

1. Introduction

Investment mediums are evolving as the world advances toward technology. Cryptocurrency, or digital currency, is one of the contemporary investment tools and functions as a substitute means of payment. The easy access to digital currencies and several user-friendly investment platforms make them a significant income source. It is recognized that cryptocurrency is a form of virtual money that may be transferred between individuals or organizations. It has been found that because of its widespread use and high profitability, digital cash has been revolutionary as a medium of exchange in the financial industry in previous years. This digital money has been proven to have the same characteristics as stock values throughout the past few decades [1,2,3].

The world is changing due to digitization, which brings forth new financial channels and cutting-edge technologies like cryptocurrencies, which are just blockchain applications. Since cryptocurrencies are inherently volatile, researchers are primarily interested in making predictions about them. Nowadays, there exist many digital cryptocurrencies, and bitcoin is one of them [4,5]. Since it first appeared in 2009, the digital currency known as bitcoin has garnered a lot of interest [6]. In certain economic operations, digital currencies like bitcoin have played a significant role [7]. The digital currency (bitcoin) is an internet network of communications that makes virtual money use, including electronic payments, easier [8,9,10]. It is predicted that digital currencies will eventually become a global economy that will substitute for currency made of paper [11,12,13,14]. Ref. [15] used Linear Regression (LR) and Support Vector Machine (SVM) to project the price of bitcoin (BTC) using the daily closing price from 2012 to 2018. The SVM model outperformed the LR model when 10-fold cross-validation was utilized in the training phase. Ref. [16] employs Bayesian optimization of RNN, LSTM, and ARIMA models to forecast the future trend of BTC price in US dollars. The results indicate that LSTM (Long Short-Term Memory) performs better than ARIMA (Autoregressive Integrated Moving Average) and RNN (recurrent neural network) models. Ref. [17] compared two state-of-the-art approaches for predicting the direction of bitcoin movement: a momentum-based strategy and random guessing. The models examined are Prophet, Random Forest (RF), RF with Lagged Auto-Regression, Multi-layer Perceptron (MLP), and ARIMA. Among the several time series prediction models, the MLP has the highest accuracy in projecting the direction of bitcoin. Ref. [18] used the BPNN (Back Propagation Neural Network), (ARIMA), and Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH) models to forecast changes in the price of bitcoin. Results from empirical research show that BPNN has a higher predictive power than ARIMA models. The ARIMA-EGARCH model is demonstrated to yield the highest price predictions, while the ARIMA-GARCH model predicts more accurately than the ARIMA-GJR-GARCH model. The goal of this study [19] is to identify the optimal ML methods for predicting the BTC price using three additional well-known coins. The study [20] employs a technique based on deep learning to forecast BTC prices, which analyzes the available data as time series and extracts significant historical characteristics. Three algorithms, ARIMA, LSTM, and FB-prophet models, are examined to address the limitations of traditional forecasting. The findings demonstrated the significance and capture of differences in the FB-prophet model. For estimating BTC prices, ref. [21] suggests a novel hybrid forecasting model. This method divides the daily BTC prices into some straightforward modes using variational mode decomposition (VMD). As empirical analysis shows, the suggested approach works noticeably better than conventional prediction techniques, with error rates for predictions being reduced by over 50%. In this work [22], the hourly values for major cryptocurrencies are predicted using deep learning (DL) models that combine with the three most widely used ensemble-averaging, bagging, and stacking techniques. According to the results, forecasting techniques that combine DL and ensemble learning (EL) can be reliable, steady, and strong. With RF and LSTM, ref. [23] aims to produce a mathematical framework that can accurately anticipate the price of BTC the following day and demonstrate the factors that affect its value. The model containing only one lag in the explanatory variables achieves the optimum accuracy in prediction BTC. The study by [24] suggests a forecasting model based on the DL ensemble model. The CNN, LSTM, and ARIMA models are utilized in the framework of the model. The suggested approach outperformed the distinct models in terms of precision in forecasting and resilience, according to the empirical findings utilizing financial historical data. To improve model interpretability, ref. [25] suggested comparing several hybrid ML techniques. DT regressors and linear regression (OLS, LASSO), LSTM, and others are employed to achieve this end.

On the other hand, ref. [26] combines LSTM, SARIMA, and FB-prophet models in a research study to anticipate the price of BTC and the volatility of the Garman–Klass (GK) approach. The results demonstrate a discernible improvement in MSE and MAE for the LTSM boost compared to SARIMA and FB-prophet [27]. This study [28] proposed five distinct deep learning techniques (DL) to predict BTC prices: RNN, LSTM, GRU, Bi-LSTM, and CONV-1D. Regarding the prediction accuracy metrics, the results show that the LSTM performed better than RNN, GRU, Bi-LSTM, and CONV1D. Ref. [29] uses four (ML) algorithms, SVM, ANN, NB, and RF, in addition to logistic regression (LR), which serves as the standard model, to estimate the price movements of BTC. According to data from experiments, the RF has the best prediction accuracy out of all the others. Ref. [30] uses three ML techniques, RF, X-GBoost, and LSTM, to propose a novel methodology for simulating the closing price of BTC. Evaluation measures show that the suggested technique outperformed alternative modeling procedures. Similarly, to assess the financial markets, forecasting models such as DL and ML are employed [31,32,33,34,35].

In the same manner, ref. [36] proposed a new hybrid model called Hy-Bi-LSTM. The results showed that the GARCH and ARIMA models performed better when external variables were included. When paired with the ARIMA-X GARCH-X model, the Bi-LSTM variant appeared better than other LSTM variants. Modern ML approaches are tested in [37], and the outcomes are compared with models based on econometrics. The results demonstrate that RNN offers a reasonable approach to predicting BTC regular return values. Ref. [38] evaluates the predicting abilities of the Jordanian NN and (NNAR) methods utilizing internal and external variables components and the ARIMA and GARCH approaches. The outcomes demonstrated that although all models produce similar results for volatility predictions, NNs are the best at forecasts. Modeling and forecasting BTC is an important topic, for example, see refs. [39,40,41,42,43].

The remaining sections of the study are arranged as follows: This study discusses the data and methods in the third section. It then goes into detail to show the results and their interpretation and compares them with previous research. Finally, it concludes with recommendations, including policy recommendations, for bitcoin users in the last section.

2. Material and Methods

This work uses ML and traditional TS models to forecast BTC prices over time. A collection of data values concerning consecutive points throughout time is called a time series. A time series study aims to project an outcome variable value, x, for the future based on prior information. The price of BTS in USD is gathered for this work using publicly available data sources: https://finance.yahoo.com/chart/BTC-USD (accessed on 20 March 2024). Data collection, cleaning, and value missing checking are performed during pre-processing. Training and testing are the two stages after which the gathered BTC data are processed. This study then applies these technical ML and TS models, namely, AR, NP-AR, SES, ARIMA, MLP, ELM, XG-boost, and NNAR, to BTC data to forecast the future BTC-USD prices. A detailed explanation of the models is given below.

2.1. Time Series Models

2.1.1. Auto-Regressive Model (AR)

A statistical framework known as the AR model is employed in time series analysis to anticipate possible future outcomes based on prior values in the series. According to the framework, the data series’ present value is determined by an unpredictable (random) term and its prior values in a linear manner [44,45]. Mathematically, the model can be written as

z_{v} = c + ϕ_{1} z_{v - 1} + ϕ_{2} z_{v - 2} + \dots + ϕ_{p} z_{v - p} + ϵ_{v}

(1)

In Equation (1), the (White Noise) function is represented by

ϵ_{v}

, whereas both the slope and the intercept of the (AR) procedure are denoted by c and

ϕ (v = 1, 2, . . ., p)

.

2.1.2. Non-Parametric Auto-Regressive Model (NP-AR)

The NP-AR model extends the conventional parameterized (AR) model. The NP-AR model does not have the exact rigid requirements that the AR model does, which estimates a given set of parameters and has a particular linear shape. Instead, it provides greater versatility in modeling complex and sometimes nonlinear interactions in historical data. In terms of mathematics, the model can be expressed as

z_{v} = f (z_{v - 1}, z_{v - 2}, \dots, z_{v - p}) + ϵ_{v}

(2)

In Equation (2),

z_{t}, f (.)

and

ϵ_{v}

represent the current value, non-parametric function, and error term, respectively. In addition to this, typical AR frameworks are unable to represent nonlinear behavior properly; NP-AR models can help handle nonlinear data collected over time [46].

2.1.3. Auto-Regressive Integrated Moving-Average Model (ARIMA)

A popular statistical model for examining and projecting time series data is the (ARIMA) model [47]. The ARIMA model integrates three elements:

The AR: The approach utilizes the correlation between the two components, the past and present data.

Integrated (I): In order to keep the series stable, it takes into consideration variations between succeeding observations.

Moving Average (MA): This model is similar to a regression in that it makes use of historical forecast errors.

Mathematically, the model can be written in the backshift operator as

ϕ (B) {(1 - B)}^{d} z_{v} = θ (Z) ϵ_{v}

(3)

where in Equation (3),

ϕ (B)

is the AR cofficient,

θ (B)

is the MA cofficient,

{(1 - B)}^{d}

is an integrated term, and

ϵ_{v}

is a stochastic term [48,49].

2.1.4. Simple Exponential Smoothing Model (SES)

When dealing with univariate data, including trend and seasonal variations, simple exponential smoothing, or SES for short, is used as a time series forecasting approach. The process of the smoothing factor, or alpha (

α

), is the only parameter used in this approach. With values usually ranging from 0 to 1,

α

controls the rate at which the effects of prior data disappear gradually [50,51]. Mathematically, the equation can be written as

{\hat{z}}_{v} = (α) z_{v - 1} + (1 - α) {\hat{z}}_{v - 1}

(4)

where in Equation (4),

z_{v - 1}, {\hat{z}}_{v - 1}

represents the observed and predicted value of the past time period, and

α

is a smoothing coefficient.

2.2. Machine Learning (ML) Models

2.2.1. Extreme-Learning-Machine Model (ELM)

An ML approach called (ELM) was first presented by [52,53] and is intended for classifying and regression-related tasks. With its unique training methodology, this particular sort of feed-forward neural network (FF-NN) sets itself apart from more conventional approaches. Mathematically, the model can be presented as

\hat{z} = h (x) \cdot β

(5)

where in Equation (5),

\hat{z}, h (x), β

are predicted from the activation function

h (x) = g (W \cdot z + b)

, and the weight matrix of the series, respectively.

2.2.2. Multi-Layer Perceptron Model (MLP)

An NN consisting of multiple distinct layers of cells is called a multi-layer perceptron (MLP). Nonlinear functions are frequently employed by neuronal cells in an MLP, which helps the network recognize and understand complex structures in data. The field of ML relies heavily on the MLP model because of its ability to represent relationships that are not linear, which makes it ideal for applications like regression analysis, identifying patterns, and classification. MLPs are trained by backpropagation, which repeatedly modifies their biases and weights to reduce the loss function [54,55]. Mathematically, the model can be written as

\hat{z} = g_{2} (W_{2} \cdot g_{1} (W_{1} \cdot z + b_{1}) + b_{2})

(6)

where in Equation (6),

\hat{z}

represents the outcome variable,

g_{1}, g_{2}

are activation functions on the input and output layers, respectively,

(W_{1}, W_{2})

are weights, and

b_{1}, b_{2}

are bias components for the hidden and output nodes.

2.2.3. Neural Network Auto-Regression Model (NN-AR)

Neural network frameworks, NNAR models, are used in numerous DL applications, including regression and classification. When modeling binary and real-valued observations, NNAR models perform well. In contrast to conventional AR models, NNAR can easily capture nonlinear patterns in historical data. A resultant node that forecasts the subsequent value in series is usually present in NN-AR models, along with single or more than single hidden nodes with nonlinear activation functions and a source of the input layer that often contains lag values [56,57,58]. Mathematically, the model can be represented as

{\hat{z}}_{v} = g (W \cdot [z_{v - 1}, z_{v - 2}, \dots, z_{v - p}] + b)

(7)

where in Equation (7),

{\hat{z}}_{v}

is the forecasted value, W represents the weight matrix, b is the bias, g denotes the activation function, i.e., the sigmoid, and

[z_{v - 1}, z_{v - 2}, \dots, z_{v - p}]

are the lag-values.

2.2.4. Extreme-Gradient Boost Model (XG-Boost)

A powerful ML technique called XG-Boost helps to make sense of complicated data. The gradient-boosting (GB) decision tree is implemented by the XG-Boost model. It is applied in the optimization of ML models. Regression as well as classification applications involving supervised learning are its main applications [59]. Mathematically, XG-Boost can be written as

L (ϕ) = \sum_{i = 1}^{n} l (z_{i}, {\hat{z}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(8)

where in (8)

L (ϕ)

stands for loss,

Ω (f_{k})

shows the regularization function, and

z_{i}, {\hat{z}}_{i}

are the actual and forecasted values.

2.2.5. Hybrid Model (Equal Weighting)

This work proposes hybrid models based on a simple averaging method for ML and TS models. In this methodology, the final hybrid forecast is created by averaging the predictions from each separate model. The various predictions from each of the eight different models are denoted as (

{\hat{z}}_{1}^{(i)}, {\hat{z}}_{2}^{(i)}, {\hat{z}}_{3}^{(i)}, {\hat{z}}_{4}^{(i)}, {\hat{z}}_{5}^{(i)}, {\hat{z}}_{6}^{(i)}, {\hat{z}}_{7}^{(i)}, {\hat{z}}_{8}^{(i)}

), where

{\hat{z}}_{i}^{(i)}

are the eight different models stated in Section 2.1 and Section 2.2. Mathematically, the proposed hybrid model

{\hat{z}}_{H}^{(i)}

can be written as:

{\hat{z}}_{H}^{(i)} = \frac{1}{8} \sum_{j = 1}^{8} {\hat{z}}_{J}^{(i)} = \frac{1}{8} ({\hat{z}}_{1}^{(i)}, {\hat{z}}_{2}^{(i)}, {\hat{z}}_{3}^{(i)}, {\hat{z}}_{4}^{(i)}, {\hat{z}}_{5}^{(i)}, {\hat{z}}_{6}^{(i)}, {\hat{z}}_{7}^{(i)}, {\hat{z}}_{8}^{(i)})

(9)

Using eight distinct mathematical frameworks, each reflecting a particular set of features, the proposed hybrid model in Equation (9) attempts to stabilize the variation and biases of the different time series models.

2.2.6. Key Performance Indicators (KPIs)

This Section 2.2.6 presents the evaluation metrics for comparison of the accuracy among the machine learning and time series models for the BTC (USD) data set. The metrics that are used for evaluation are given in Table 1 and are discussed briefly.

In addition to the KPI defined in Table 1, the DM test is also performed in this work. However, the DM test is a widely used statistical test for comparing forecasts obtained from different forecasting models in the literature on time series modeling and forecasting [60,61,62]. For example, consider two forecasts available for the time series

z_{t}

for

t = 1, \dots, T

, denoted by

{\hat{z}}_{1 t}

and

{\hat{z}}_{2 t}

. The obtained errors for these forecasts are given by

h_{1 t} = z_{t} - {\hat{z}}_{1 t}

and

h_{2 t} = z_{t} - {\hat{z}}_{2 t}

. Hence, the

£ (h_{j, t})

will be the loss associated with the prediction error by

{h_{j, t}}_{j = 1}^{2}

. For instance, the absolute loss at time t is

£ (h_{j, t}) = | h_{j, t} |

. The difference in loss between the forecast of model 1 and that of model 2 at time t is

d_{t} = £ (h_{1 t}) - £ (h_{2 t})

. The null hypothesis that the two forecasts have equal predictive accuracy is

E [d_{t}] = 0

. The DM test requires the loss difference to be covariance stationary, that is,

\begin{matrix} E [d_{t}] & = & μ, \forall t \end{matrix}

(10)

\begin{matrix} cov (d_{t} - d_{t - τ}) & = & γ (τ), \forall t \end{matrix}

(11)

\begin{matrix} var (d_{t}) & = & σ_{d}, 0 < σ_{d} < \infty \end{matrix}

(12)

Under these assumptions, the DM test of equal forecast accuracy is as follows:

DM = \frac{\bar{d}}{{\hat{σ}}_{\bar{d}}} \to N (0, 1)

where

\bar{d} = \frac{1}{T} \sum_{t = 1}^{T} d_{t}

is the sample mean loss differential and

{\hat{σ}}_{\bar{d}}

is a consistent standard error estimate of

d_{t}

.

Metrices such as the MAPE, RMSE, MAE, correlation, SMAPE, and MDA are presented in Table 1 and the DM test is used to access the performance of the TS, ML, and hybrid models. These metrices are defined as the MAPE, a statistic used to assess a financial projection performance. The accuracy of the anticipated values relative to the actual values is determined by averaging the absolute percentage deviations of every value in a data set. With 10%, MAPE indicates that a+ or −10% variation exists on average between the predicted and actual values. Projections can improve accuracy by utilizing MAPE in financial problems.

RMSE is utilized to access the performance accuracy in TS models. It offers an approximation of the accuracy (the degree) to which the model can forecast the intended value. A model is considered better when its RMSE value is lower.

MAE is a significant statistic that is employed to assess how accurately TS models function. Since MAE increases in a linear manner, it makes sense, in contrast to the RMSE. This metric is simple, robust, and intuitive for financial projections. Utilizing correlational analysis, this work assess how strongly, as well as which, pairs of variables are related, with values ranging from −1 to 1.

In time series computation, SMAPE is a commonly used statistic. The proportion of variation between expected and actual values is taken into consideration by the SMAPE metric. Higher accuracy is indicated by lower SMAPE values, ranging from 0% to 100%, with a 0% value SMAPE indicating an exact match and otherwise deviation from the actual value.

MDA is a statistical metric, employed to assess a projection approach’s effectiveness in making predictions. MDA calculates the frequency with which the actual trend of observations coincides with its expected direction [63].

The Diebold–Mariano (DM-test) measure evaluates the preciseness of two forecasting models. The DM test examines the null hypothesis that the two competitive forecasts have no expected loss difference. It relies on an expected loss function connected to each model’s forecast error. The test statistic for the DM test is asymptotically normally distributed and easy to calculate [64].

3. Forecasting Results and Interpretations

This research aims to evaluate the feasibility of a hybrid model that employs conventional averaging, contrary to the ML and TS frameworks, by implementing it on the daily BTC-USD data set [31]. A close-ended daily price graph of BTC-USD is shown in Figure 1, which shows an increasing trend for the upcoming BTC-USD price. Additionally, this work explores the behavior of data by calculating the basic statistics and applying the ADF-test [65]. The results are presented in Table 2, which indicates that the total

n = 390

, the min value is 25,162.65, the max value is 73,083.50, and the mean, S.D, skewness, and kurtosis are 51,812.17, 14,574.24, −0.41, and 1.78, accordingly, although to check the stationarity, this study applied a unit-root test (ADF) on the BTC-USD series and found that the series is not stationary at level (0) but on the 1st difference shown in Table 2. The next step is to move towards the modeling phase for the comparative analysis, and this begins by plotting the correlogram (ACF, PACF) of BTC-USD. The correlogram for the BTC-USD is shown in Figure 2. This work uses a correlogram to take advantage of the selection of lags to guess the candidate models for both ML and TS.

A commonly used empirical approach for comparing the effectiveness of two or more frameworks is the data-splitting technique. The BTC-USD data set is divided into two categories: the learning phase and the validation phase. The learning phase comprises three ratios. e.g., 70, 80, and 90%; the validation phase comprises 30, 20, and 10%. The results for the 70% training and 30% testing data set are presented in Table 3 [33]. The results presented in Table 3 show that the proposed hybrid model (9) outperforms other models in terms of the RMSE, MAPE, MAE, correlation, and SMAPE, with values of 184.2901, 136.0214, 0.2411, 0.9989, and 0.2313. In addition to this, the AR(1,0) model ranks 2nd in order of minimum errors and high correlation, while the 3rd model which results in minimum errors and high correlation is the ARIMA (1,1,1) model. A graphical demonstration of these metrices is provided in Figure 3a for the error metrics. Similarly, the results for 80% training and 20% testing are given in Table 4, which indicates that the proposed hybrid model results in slight improvements in the error metrics with RMSE = 182.7972, MAE = 134.7952, MAPE = 0.2241, and SMAPE = 0.2293. In contrast, fairly high correlation values between the observed and fitted values are given by the ELM, MLP, and NP-AR models. A graphical display for these metrics is given in Figure 3b. Lastly, for the 90% learning phase and 10% testing phase, the proposed model surpasses other comparative models and shows improved significant results in terms of errors, such as RMSE = 147.6284, MAE = 112.0043, and MPAE = 0.1882, which are presented in Table 5. However, for the correlation and MDA, the ELM, MLP, and NNAR (1,1) models produce better results than the hybrid model. A graphical demonstration of these KPIs is given in Figure 3c.

Once the computation of performance indicators (MAE, MAPE, RMSE, SMAPE, correlation, and mean directional accuracy (MDA)) has been carried out, the next step is to apply the DM-Test [64]. The Diebold–Mariano (DM-test) is implemented in this research to statistically evaluate the competitiveness of models within the suggested approach. Based on the DM-test approach, eight forecasting models, including four ML and four TS, as base models and one hybrid model (proposed) are evaluated at a 5% significance level and the results for 30%, 20%, and 10% are shown in Table 6, Table 7 and Table 8 (see p-values). After conducting the DM test and performing statistical analysis, the results indicate that the hybrid model performed similarly to other competitive models. However, in terms of error metrics, the hybrid model outperformed the others, demonstrating superior accuracy and consistency in BTC-USD projections.

To summarize this section, the forecast for the next fifteen days is also carried out using the training phase models with the same parameters on 30, 20, and 10 (percent) of data. The results are presented in Table 9 for the three ratios. It is confirmed from this Table that the forecasted values are close to the observed values, which confirms the superiority of the hybrid model.

4. Discussion

After comparing each of the nine models under study using three evaluation measures, including the graphical evaluation (correlation plot, errors plot), forecast accuracy test (errors by the DM-test), accuracy mean errors (MAPE, MAE, RMSE, SMAPE, correlation, and MDA), the optimal model (the hybrid) is identified. This section presents the existing literature for the best models on BTC-USD with the proposed methodology. It is important to note that such comparisons are challenging because different authors utilize various indicators, forecasting periods, and forecasting horizons. For instance, in the work of [48], the ARIMA model outputs the lowest average accuracy errors of MAPE = 3.59 and RMSE = 78.68 for the period November 2017 to February 2022, but this study extends the data when the volatility of the mean value fluctuates and suggests best results comparatively. The results reported in ref. [38] showed that the RMSE for BTC-USD was 4802.24, which is far higher than for the proposed method. Similarly, in [20], the performance metrics RMSE and MAE were calculated and it was found that the optimal values were 322.599 and 229.254 according to the FB-prophet model, which are higher than for the current study’s proposed model. Lastly, ref. [34] computed RMSE = 6117.16, MAPE = 1.77, and MAE = 4008.28, which are higher values than for the proposed model. It was observed that the proposed model outperformed and surpassed the other models in terms of the mean accuracy measure, which indicates that this model can better forecast the future prices of BTC-USD; in addition, there are other benefits to having daily BTC-USD forecasts that are precise and efficient. These benefits include overfitting, generalization, volatility, accuracy, managing risks, and making investment decisions.

5. Conclusions

In conclusion, this study explored a hybrid model that combines TS and ML models with basic averaging to predict BTC-USD prices. In order to compare the models, the study analyzed historical bitcoin data. The hybrid approach that has been proposed produced remarkable results in identifying distinct patterns in the values of bitcoin over time. This shows how well the suggested framework predicts the fluctuating bitcoin market. The hybrid model’s effectiveness was confirmed by the numerical findings, which showed excellent results, such as RMSE = 184.2901, 182.7972, and 147.6287, MAE = 0.2411, 0.2241, and 0.1882, MAPE = 136.0214, 134.7952, and 112.0043 on 30, 20, and 10 percent of the testing errors, respectively. Like all research, this study has certain limitations. For instance, despite employing BTC-USD prices, the present research employed various ML and TS models. The proposed model’s limitations are connected to its equal weight, which could lead to alternative models performing better in different scenarios but that are given the same weight. Second, there is model correlation, where the hybrid’s individual models have a high degree of correlation with one another (i.e., they typically produce equal results). Lastly, there is outlier sensitivity, where the influence of such outliers may not be sufficiently reduced by simple averaging. Furthermore, this work can be extended to consider the weight average, meta-learning, and the ANN ensemble for improvement in the results.

In conclusion, by offering insightful information about market dynamics, risks involved, and possibilities, the hybrid model for BTC-USD price forecasting that has been proposed can be invaluable in determining policy. It can help investors and stack-holders deal with challenges associated with quickly changing bitcoin conditions while making well-informed judgments on monetary policy, stability in the economy, and entrepreneurship.

Author Contributions

Conceptualization, methodology, and software, M.Q. and H.I.; validation, M.Q., H.I., M.Z.R. and P.C.R.; formal analysis, H.I.; investigation, H.I., M.Q. and M.Z.R.; resources, M.Z.R. and P.C.R.; data curation, H.I. and M.Q.; writing—original draft preparation, M.Q., H.I. and M.Z.R.; writing—review and editing, M.Q., H.I., M.Z.R., P.C.R. and S.A.A.S.; visualization, S.A.A.S., P.C.R. and H.I.; supervision, P.C.R. and H.I.; project administration, H.I., S.A.A.S. and P.C.R.; funding acquisition, M.Z.R. and P.C.R. All authors have read and agreed to the published version of the manuscript.

Funding

Researchers Supporting Project number (RSPD2024R1038), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available on Yahoo finance https://finance.yahoo.com/chart/BTC-USD (accessed on 20 March 2024).

Acknowledgments

The authors extend their sincere appreciation to the Researchers Supporting Project number (RSPD2024R1038), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Koo, E.; Kim, G. Centralized decomposition approach in LSTM for Bitcoin price prediction. Expert Syst. Appl. 2024, 237, 121401. [Google Scholar] [CrossRef]
Zhou, Z.; Zhou, X.; Qi, H.; Li, N.; Mi, C. Near miss prediction in commercial aviation through a combined model of grey neural network. Expert Syst. Appl. 2024, 255, 124690. [Google Scholar] [CrossRef]
Rodrigues, P.C.; Awe, O.O.; Pimentel, J.S.; Mahmoudvand, R. Modelling the behaviour of currency exchange rates with singular spectrum analysis and artificial neural networks. Stats 2020, 3, 137–157. [Google Scholar] [CrossRef]
Gohwong, S.G. The state of the art of cryptocurrencies. Asian Adm. Manag. Rev. 2018, 1. [Google Scholar]
Iftikhar, H.; Zafar, A.; Turpo-Chaparro, J.E.; Canas Rodrigues, P.; López-Gonzales, J.L. Forecasting day-ahead brent crude oil prices using hybrid combinations of time series models. Mathematics 2023, 11, 3548. [Google Scholar] [CrossRef]
Velde, F. Bitcoin: A Primer; Federal Reserve Bank of Chicago: Chicago, IL, USA, 2013. [Google Scholar]
Li, X.; Whinston, A.B. Analyzing cryptocurrencies. Inf. Syst. Front. 2020, 22, 17–22. [Google Scholar] [CrossRef]
John, K.; O’Hara, M.; Saleh, F. Bitcoin and beyond. Annu. Rev. Financ. Econ. 2022, 14, 95–115. [Google Scholar] [CrossRef]
Iftikhar, H.; Khan, M.; Turpo-Chaparro, J.E.; Rodrigues, P.C.; Lopez-Gonzales, J.L. Forecasting stock prices using a novel filtering-combination technique: Application to the Pakistan stock exchange. AIMS Math. 2024, 9, 3264–3289. [Google Scholar] [CrossRef]
Luo, J.; Zhao, C.; Chen, Q.; Li, G. Using deep belief network to construct the agricultural information system based on Internet of Things. J. Supercomput. 2022, 78, 379–405. [Google Scholar] [CrossRef]
Fauzi, M.A.; Paiman, N.; Othman, Z. Bitcoin and cryptocurrency: Challenges, opportunities and future works. J. Asian Financ. Econ. Bus. 2020, 7, 695–704. [Google Scholar] [CrossRef]
Farell, R. An analysis of the cryptocurrency industry. Whart. Res. Sch. 2015, 130, 1–23. [Google Scholar]
Chiu, J.; Koeppl, T.V. The economics of cryptocurrency: Bitcoin and beyond. Can. J. Econ. Can. D’économique 2022, 55, 1762–1798. [Google Scholar] [CrossRef]
Xu, K.; Chen, L.; Patenaude, J.M.; Wang, S. Rhine: A regime-switching model with nonlinear representation for discovering and forecasting regimes in financial markets. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM), Houston, TX, USA, 18–20 April 2024; pp. 526–534. [Google Scholar]
Karasu, S.; Altan, A.; Saraç, Z.; Hacioğlu, R. Prediction of Bitcoin prices with machine learning methods using time series data. In Proceedings of the 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar]
McNally, S.; Roche, J.; Caton, S. Predicting the price of bitcoin using machine learning. In Proceedings of the 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Cambridge, UK, 21–23 March 2018; pp. 339–343. [Google Scholar]
Ibrahim, A.; Kashef, R.; Corrigan, L. Predicting market movement direction for bitcoin: A comparison of time series modeling methods. Comput. Electr. Eng. 2021, 89, 106905. [Google Scholar] [CrossRef]
Lian, Y.M.; Chen, J.L.; Cheng, H.C. Predicting bitcoin prices via machine learning and time series models. J. Appl. Financ. Bank. 2022, 12, 25–43. [Google Scholar]
Maleki, N.; Nikoubin, A.; Rabbani, M.; Zeinali, Y. Bitcoin price prediction based on other cryptocurrencies using machine learning and time series analysis. Sci. Iran. 2023, 30, 285–301. [Google Scholar] [CrossRef]
Tripathy, N.; Hota, S.; Mishra, D. Performance analysis of bitcoin forecasting using deep learning techniques. Indones. J. Electr. Eng. Comput. Sci. 2023, 31, 1515–1522. [Google Scholar] [CrossRef]
Zhao, L.; Li, Z.; Ma, Y.; Qu, L. A novel cryptocurrency price time series hybrid prediction model via machine learning with MATLAB/Simulink. J. Supercomput. 2023, 79, 15358–15389. [Google Scholar] [CrossRef]
Rao, K.R.; Prasad, M.L.; Kumar, G.R.; Natchadalingam, R.; Hussain, M.M.; Reddy, P.C.S. Time-Series Cryptocurrency Forecasting Using Ensemble Deep Learning. In Proceedings of the 2023 International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 10–11 August 2023; pp. 1446–1451. [Google Scholar]
Chen, J. Analysis of bitcoin price prediction using machine learning. J. Risk Financ. Manag. 2023, 16, 51. [Google Scholar] [CrossRef]
He, K.; Yang, Q.; Ji, L.; Pan, J.; Zou, Y. Financial time series forecasting with the deep learning ensemble model. Mathematics 2023, 11, 1054. [Google Scholar] [CrossRef]
Liu, S.; Wu, K.; Jiang, C.; Huang, B.; Ma, D. Financial time-series forecasting: Towards synergizing performance and interpretability within a hybrid machine learning approach. arXiv 2023, arXiv:2401.00534. [Google Scholar]
Cheng, J.; Tiwari, S.; Khaled, D.; Mahendru, M.; Shahzad, U. Forecasting Bitcoin prices using artificial intelligence: Combination of ML, SARIMA, and Facebook Prophet models. Technol. Forecast. Soc. Chang. 2024, 198, 122938. [Google Scholar] [CrossRef]
Qureshi, M.; Khan, A.; Daniyal, M.; Tawiah, K.; Mehmood, Z. A comparative analysis of traditional SARIMA and machine learning models for CPI data modelling in Pakistan. Appl. Comput. Intell. Soft Comput. 2023, 2023, 3236617. [Google Scholar] [CrossRef]
Nair, M.; Marie, M.I.; Abd-Elmegid, L.A. Prediction of Cryptocurrency Price using Time Series Data and Deep Learning Algorithms. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 338–347. [Google Scholar] [CrossRef]
Pabuçcu, H.; Ongan, S.; Ongan, A. Forecasting the movements of Bitcoin prices: An application of machine learning algorithms. arXiv 2023, arXiv:2303.04642. [Google Scholar] [CrossRef]
Khosravi, M.; Ghazani, M.M. Novel insights into the modeling financial time-series through machine learning methods: Evidence from the cryptocurrency market. Expert Syst. Appl. 2023, 234, 121012. [Google Scholar] [CrossRef]
Ampountolas, A. Comparative analysis of machine learning, hybrid, and deep learning forecasting models: Evidence from European financial markets and bitcoins. Forecasting 2023, 5, 472–486. [Google Scholar] [CrossRef]
He, Q.; Xia, P.; Hu, C.; Li, B. Public Information, Actual Intervention and Inflation Expectations. Transform. Bus. Econ. 2022, 21, 644. [Google Scholar]
Murray, K.; Rossi, A.; Carraro, D.; Visentin, A. On forecasting cryptocurrency prices: A comparison of machine learning, deep learning, and ensembles. Forecasting 2023, 5, 196–209. [Google Scholar] [CrossRef]
Noviandy, T.R.; Maulana, A.; Idroes, G.M.; Suhendra, R.; Adam, M.; Rusyana, A.; Sofyan, H. Deep learning-based bitcoin price forecasting using neural prophet. Ekon. J. Econ. 2023, 1, 19–25. [Google Scholar] [CrossRef]
Iftikhar, H.; Khan, M.; Żywiołek, J.; Khan, M.; López-Gonzales, J.L. Modeling and Forecasting Carbon Dioxide Emission in Pakistan Using a Hybrid Combination of Regression and Time Series Models. Heliyon 2024, 10, e33148. [Google Scholar] [CrossRef]
Mardjo, A.; Choksuchat, C. HyBiLSTM: Multivariate Bitcoin Price Forecasting using Hybrid Time Series Models with Bidirectional LSTM. IEEE Access 2024, 12, 50792–50808. [Google Scholar] [CrossRef]
Berger, T.; Koubová, J. Forecasting Bitcoin returns: Econometric time series analysis vs. machine learning. J. Forecast. 2024, 43, 2904–2916. [Google Scholar] [CrossRef]
Šestanović, T. A Comprehensive Approach to Bitcoin Forecasting Using Neural Networks. Ekon. Pregl. 2024, 75, 62–85. [Google Scholar] [CrossRef]
Alizadegan, H.; Radmehr, A.; Ilani, M.A. Forecasting Bitcoin Prices: A Comparative Study of Machine Learning and Deep Learning Algorithms. Res. Square 2024. [Google Scholar] [CrossRef]
Fadhil, H.M.; Makhool, N.Q. Forecasting Cryptocurrency Market Trends with Machine Learning and Deep Learning. In Proceedings of the BIO Web of Conferences; EDP Sciences: Les Ulis, France, 2024; Volume 97, p. 00053. [Google Scholar]
Gonzales, S.M.; Iftikhar, H.; López-Gonzales, J.L. Analysis and forecasting of electricity prices using an improved time series ensemble approach: An application to the Peruvian electricity market. Aims Math. 2024, 9, 21952–21971. [Google Scholar] [CrossRef]
Tang, X.; Song, Y.; Jiao, X.; Sun, Y. On forecasting realized volatility for bitcoin based on deep learning PSO–GRU model. Comput. Econ. 2024, 63, 2011–2033. [Google Scholar] [CrossRef]
Carbo-Bustinza, N.; Iftikhar, H.; Belmonte, M.; Cabello-Torres, R.J.; De La Cruz, A.R.H.; López-Gonzales, J.L. Short-term forecasting of Ozone concentration in metropolitan Lima using hybrid combinations of time series models. Appl. Sci. 2023, 13, 10514. [Google Scholar] [CrossRef]
Garg, S.; Anupriya. Autoregressive integrated moving average model based prediction of bitcoin close price. In Proceedings of the 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 13–14 December 2018; pp. 473–478. [Google Scholar]
Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting electricity demand and prices: A comparison of alternative approaches. J. Math. 2022, 2022, 3581037. [Google Scholar] [CrossRef]
López-Gonzales, J.L.; Castro Souza, R.; Leite Coelho da Silva, F.; Carbo-Bustinza, N.; Ibacache-Pulgar, G.; Calili, R.F. Simulation of the energy efficiency auction prices via the markov chain monte carlo method. Energies 2020, 13, 4544. [Google Scholar] [CrossRef]
Iftikhar, H.; Turpo-Chaparro, J.E.; Canas Rodrigues, P.; López-Gonzales, J.L. Forecasting day-ahead electricity prices for the Italian electricity market using a new decomposition—Combination technique. Energies 2023, 16, 6669. [Google Scholar] [CrossRef]
Qureshi, M.; Ahmed, N. Forecasting Cryptocurrencies using the Classical Time Series Approach. KASBIT Bus. J. 2022, 15, 15–27. [Google Scholar]
Song, L.; Chen, S.; Meng, Z.; Sun, M.; Shang, X. FMSA-SC: A Fine-grained Multimodal Sentiment Analysis Dataset based on Stock Comment Videos. IEEE Trans. Multimed. 2024, 26, 7294–7306. [Google Scholar] [CrossRef]
Qureshi, M.; Ahmad, N.; Ullah, S.; ul Mustafa, A.R. Forecasting real exchange rate (REER) using artificial intelligence and time series models. Heliyon 2023, 9, e16335. [Google Scholar] [CrossRef]
Chang, X.; Gao, H.; Li, W. Discontinuous distribution of test statistics around significance thresholds in empirical accounting studies. J. Account. Res. 2023. [Google Scholar] [CrossRef]
Huang, G.B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011, 2, 107–122. [Google Scholar] [CrossRef]
Zhu, C. An adaptive agent decision model based on deep reinforcement learning and autonomous learning. J. Logist. Inform. Serv. Sci. 2023, 10, 107–118. [Google Scholar]
Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
Yang, X.; Liu, Q.; Su, R.; Tang, R.; Liu, Z.; He, X.; Yang, J. Click-through rate prediction using transfer learning with fine-tuned parameters. Inf. Sci. 2022, 612, 188–200. [Google Scholar] [CrossRef]
Sako, K.; Mpinda, B.N.; Rodrigues, P.C. Neural networks for financial time series forecasting. Entropy 2022, 24, 657. [Google Scholar] [CrossRef]
Sulandari, W.; Subanar, S.; Lee, M.H.; Rodrigues, P.C. Time series forecasting using singular spectrum analysis, fuzzy systems and neural networks. MethodsX 2020, 7, 101015. [Google Scholar] [CrossRef]
Iftikhar, H.; Zywiołek, J.; López-Gonzales, J.L.; Albalawi, O. Electricity consumption forecasting using a novel homogeneous and heterogeneous ensemble learning. Front. Energy Res. 2024, 12, 1442502. [Google Scholar] [CrossRef]
Al Hawi, L.; Sharqawi, S.; Al-Haija, Q.A.; Qusef, A. Empirical evaluation of machine learning performance in forecasting cryptocurrencies. J. Adv. Inf. Technol. 2023, 14, 639–646. [Google Scholar] [CrossRef]
Iftikhar, H.; Turpo-Chaparro, J.E.; Canas Rodrigues, P.; López-Gonzales, J.L. Day-Ahead Electricity Demand Forecasting Using a Novel Decomposition Combination Method. Energies 2023, 16, 6675. [Google Scholar] [CrossRef]
Shah, I.; Lisi, F. Forecasting of electricity price through a functional prediction of sale and purchase curves. J. Forecast. 2020, 39, 242–259. [Google Scholar] [CrossRef]
Iftikhar, H.; Bibi, N.; Canas Rodrigues, P.; López-Gonzales, J.L. Multiple novel decomposition techniques for time series forecasting: Application to monthly forecasting of electricity consumption in Pakistan. Energies 2023, 16, 2579. [Google Scholar] [CrossRef]
Costantini, M.; Cuaresma, J.C.; Hlouskova, J. Forecasting errors, directional accuracy and profitability of currency trading: The case of EUR/USD exchange rate. J. Forecast. 2016, 35, 652–668. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
Qureshi, M.; Daniyal, M.; Tawiah, K. Comparative Evaluation of the Multilayer Perceptron Approach with Conventional ARIMA in Modeling and Prediction of COVID-19 Daily Death Cases. J. Healthc. Eng. 2022, 2022, 4864920. [Google Scholar] [CrossRef]

Figure 1. BTC-USD price over time September 2023 to September 2024. The orange color dots highlights the individual data values, and the blue color connects these points to reveal the overall trend or pattern in the dataset.

Figure 2. Correlogram of BTC-USD on level series and 1st difference.

Figure 3. Comparisons of models using key performance indicators.

Table 1. Key performance indicators (KPI) for bitcoin (USD).

S. No	Error	Equations
i	$M A P E$	$\frac{1}{n} \sum_{t = 1}^{n} \|\frac{z_{v} - {\hat{z}}_{v}}{z_{v}}\| \times 100$
ii	$M A E$	$\frac{1}{n} \sum_{t = 1}^{n} \|z_{v} - {\hat{z}}_{v}\|$
iii	$R M S E$	$\sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(z_{v} - {\hat{z}}_{v})}^{2}}$
iv	$C O R R E L A T I O N$	$\frac{\sum_{i = 1}^{n} (z_{i} - \bar{z}) (x_{i} - \bar{x})}{\sqrt{\sum_{i = 1}^{n} {(z_{i} - \bar{z})}^{2} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}}$
v	$S M A P E$	$\frac{100 %}{n} \sum_{t = 1}^{n} \frac{\| z_{v} - {\hat{z}}_{v} \|}{\frac{\| z_{v} \| + \| {\hat{z}}_{v} \|}{2}}$
vi	$M D A$	$\frac{1}{n} \sum_{t = 2}^{n} I (\frac{(z_{v} - z_{v - 1}) ({\hat{z}}_{v} - {\hat{z}}_{v - 1})}{\| z_{v} - z_{v - 1} \| \cdot \| {\hat{z}}_{v} - {\hat{z}}_{v - 1} \|} > 0)$

Table 2. Descriptive statistics for BTC-USD.

Statistic	Values
Sample Size (n)	390
Min	25,162.65
25%	40,368.76
50%	57,387.97
75%	64,234.18
Max	73,083.50
Mean	51,812.17
S.D	14,574.24
Skewness	−0.41
Kurtosis	1.78
ADF-Test Level (0)	0.71
ADF-Test Level (1)	0.01

Table 3. Key performance indicators (KPIs) for 30% BTC-USD validation data.

	MAPE	MAE	RMSE	Correlation	SMAPE	MDA
AR	0.2416	145.9313	195.1101	0.9984	0.2580	0.9700
NP-AR	0.5801	354.9212	401.4765	0.9983	0.5379	1.0000
ARIMA	0.2426	150.0234	199.8098	0.9979	0.2692	0.9600
SES	0.3730	246.7761	306.4533	0.9977	0.4180	0.9500
NNAR	0.5730	348.5911	388.1201	0.9983	0.5656	1.0000
MLP	0.6681	400.2624	446.3543	0.9986	0.6809	1.0000
ELM	0.3723	240.1560	328.0201	0.9989	0.3488	0.9900
XG-boost	1.6317	987.7198	1299.4456	0.9425	1.7882	0.7000
Hybrid	0.2411	136.0214	184.2901	0.9989	0.2313	0.9800

Table 4. Key performance indicators (KPIs) for 20% BTC-USD validation data.

	MAPE	MAE	RMSE	Correlation	SMAPE	MDA
AR	0.2416	146.3113	199.2608	0.9986	0.2470	0.9481
NP-AR	0.5201	308.9151	344.5681	0.9996	0.5269	1.0000
ARIMA	0.2526	153.0345	206.6616	0.9985	0.2592	0.9351
SES	0.3730	252.4403	306.5377	0.9987	0.4070	0.9610
NNAR	0.5530	328.8220	363.1769	0.9993	0.5546	1.0000
MLP	0.6781	402.4721	442.3189	0.9996	0.6708	1.0000
ELM	0.3423	216.1259	291.4463	0.9999	0.3387	1.0000
XG-boost	1.7793	1070.7159	1351.0790	0.9425	1.7081	0.6623
Hybrid	0.2241	134.7952	182.7972	0.9988	0.2293	0.9740

Table 5. Key performance indicators (KPIs) for 10% BTC-USD validation data.

	MAPE	MAE	RMSE	Correlation	SMAPE	MDA
AR	0.2066	122.8504	162.3767	0.9984	0.2066	0.9737
NP-AR	0.5005	293.8465	317.9673	0.9997	0.5008	1.0000
ARIMA	0.1877	111.7080	147.2674	0.9987	0.1878	0.9737
SES	0.3401	227.5421	281.0904	0.9987	0.3709	0.9737
NNAR	0.5790	340.7026	359.9551	0.9995	0.5687	1.0000
MLP	0.6982	411.0493	433.1569	0.9997	0.6885	1.0000
ELM	0.2345	141.8653	178.7457	0.9999	0.2284	1.0000
XG-boost	1.5898	934.7921	1244.8303	0.9157	1.5192	0.5789
Hybrid	0.1882	112.0043	147.6284	0.9987	0.1888	0.9737

Table 6. DM-test for comparison of models on 30% BTC-USD validation data.

	AR	NP-AR	ARIMA	NNAR	MLP	ELM	XG-Boost	Hybrid
AR	0	1	0.92	1	1	1	1	1
NP-AR	0	0	0	0.03	1	0.02	1	0
ARIMA	0.08	1	0	1	1	1	1	1
SES	0	1	0	1	1	0.83	1	0
NNAR	0	0.97	0	0	1	0.04	1	0
MLP	0	0	0	0	0	0	1	0
ELM	0	0.98	0	0.96	1	0	1	0.26
XG-boost	0	0	0	0	0	0	0	0
Hybrid	0	1	0	1	1	0.74	1	0

Table 7. DM-test for comparison of models on 20% BTC-USD validation data.

	AR	NP-AR	ARIMA	SES	NNAR	MLP	ELM	XG-Boost	Hybrid
AR	0	1	0.94	0	1	1	0.99	1	1
NP-AR	0	0	0	0	1	1	0.07	1	0.04
ARIMA	0.06	1	0	0.01	1	1	0.99	1	1
SES	1	1	0.99	0	1	1	1	1	1
NNAR	0	0	0	0	0	1	0.02	1	0
MLP	0	0	0	0	0	0	0	1	0
ELM	0.01	0.93	0.01	0	0.98	1	0	1	0.66
XG-boost	0	0	0	0	0	0	0	0	0
Hybrid	0	0.96	0	0	1	1	0.34	1	0

Table 8. DM-test for comparison of models on 10% BTC-USD validation data.

	AR	NP-AR	ARIMA	SES	NNAR	MLP	ELM	XG-Boost	Hybrid
AR	0	1	0	0	1	1	0.72	1	0.99
NP-AR	0	0	0	0	1	1	0	1	0.12
ARIMA	1	1	0	1	1	1	0.88	1	1
SES	1	1	0	0	1	1	0.88	1	1
NNAR	0	0	0	0	0	1	0	0.99	0
MLP	0	0	0	0	0	0	0	0.99	0
ELM	0.28	1	0.12	0.12	1	1	0	1	0.98
XG-boost	0	0	0	0	0.01	0.01	0	0	0
Hybrid	0.01	0.88	0	0	1	1	0.02	1	0

Table 9. Forecasted versus observed values for next 15 days ahead BTC-USD Price by Hybrid Model with 30%, 20%, and 10% data ratios.

Date	With 30% Models	With 20% Models	With 10% Models	Observed Values
25/09/2024	63,933.79	63,932.13	63,926.07	63,143.14
26/09/2024	63,931.45	63,895.78	63,971.86	65,181.02
27/09/2024	63,935.87	63,922.60	63,977.60	65,790.66
28/09/2024	63,969.28	63,971.81	63,967.33	65,887.65
29/09/2024	63,967.35	63,925.77	63,972.97	65,635.30
30/09/2024	63,922.74	63,975.30	63,985.41	63,329.50
01/10/2024	63,933.83	63,920.44	63,968.44	60,837.01
02/10/2024	63,974.16	63,973.62	63,975.61	60,632.79
03/10/2024	63,980.35	63,972.56	63,970.42	60,759.40
04/10/2024	63,966.97	63,989.06	63,982.17	62,067.48
05/10/2024	63,968.02	63,923.40	63,934.41	62,089.95
06/10/2024	63,986.10	63,968.04	63,935.22	62,818.95
07/10/2024	63,974.36	63,977.52	63,965.84	62,236.66
08/10/2024	63,932.70	63,978.15	63,976.76	62,131.97
09/10/2024	63,930.86	63,976.93	63,935.19	60,582.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qureshi, M.; Iftikhar, H.; Rodrigues, P.C.; Rehman, M.Z.; Salar, S.A.A. Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting. Mathematics 2024, 12, 3666. https://doi.org/10.3390/math12233666

AMA Style

Qureshi M, Iftikhar H, Rodrigues PC, Rehman MZ, Salar SAA. Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting. Mathematics. 2024; 12(23):3666. https://doi.org/10.3390/math12233666

Chicago/Turabian Style

Qureshi, Moiz, Hasnain Iftikhar, Paulo Canas Rodrigues, Mohd Ziaur Rehman, and S. A. Atif Salar. 2024. "Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting" Mathematics 12, no. 23: 3666. https://doi.org/10.3390/math12233666

APA Style

Qureshi, M., Iftikhar, H., Rodrigues, P. C., Rehman, M. Z., & Salar, S. A. A. (2024). Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting. Mathematics, 12(23), 3666. https://doi.org/10.3390/math12233666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Modeling to Improve Time Series Forecasting Using Machine Learning, Time Series, and Hybrid Models: A Case Study of Bitcoin Price Forecasting

Abstract

1. Introduction

2. Material and Methods

2.1. Time Series Models

2.1.1. Auto-Regressive Model (AR)

2.1.2. Non-Parametric Auto-Regressive Model (NP-AR)

2.1.3. Auto-Regressive Integrated Moving-Average Model (ARIMA)

2.1.4. Simple Exponential Smoothing Model (SES)

2.2. Machine Learning (ML) Models

2.2.1. Extreme-Learning-Machine Model (ELM)

2.2.2. Multi-Layer Perceptron Model (MLP)

2.2.3. Neural Network Auto-Regression Model (NN-AR)

2.2.4. Extreme-Gradient Boost Model (XG-Boost)

2.2.5. Hybrid Model (Equal Weighting)

2.2.6. Key Performance Indicators (KPIs)

3. Forecasting Results and Interpretations

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI