Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer
Abstract
:1. Introduction
- (1)
- We proposed a sales forecasting model based on the Transformer architecture, featuring an encoder–decoder structure and multi-head attention mechanisms. Our model introduces adjustments to the standard Transformer architecture, especially the removal of the Softmax layer, tailored to the unique requirements of sales forecasting.
- (2)
- A significant modification lies in optimizing the model for sales forecasting and the characteristics of sales data. In the input embedding layer, we employ a linear sub-layer to replace the Word2Vec sub-layer found in the original Transformer model, and a Lookup table sub-layer is employed to map features related to influencing factors. For positional encoding, we incorporate Time2Vec [8], extending beyond the traditional Sin/Cos position encoding. This inclusion enables our model to capture both periodic and non-periodic patterns within historical sales data effectively. Within the feedforward network sub-layer, we introduce the Exponential Linear Unit (ELU) as the new activation function, replacing the standard Transformer’s Rectified Linear Unit (ReLU). This change serves to prevent neuronal inactivation during training, handle negative data, and enhance the training speed.
- (3)
- Furthermore, we have introduced a comprehensive formula representation of the model for the first time. This inclusion serves as a valuable resource, aiding individuals in gaining a deeper understanding of the model’s principles and facilitating its implementation.
- (4)
- Our dataset for sales forecasting comprises critical factors significantly influencing sales predictions, including holidays, seasons, promotions, and special events. These elements introduce volatility and complex nonlinear patterns into sales data. The model presented in this article incorporates attention mechanisms, enabling direct correlation calculations among sales data without the need for intermediary hidden layers. This approach reveals complex nonlinear patterns through in-depth processing, enhancing model interpretability and predictive accuracy. Additionally, it capitalizes on parallel computing, harnessing GPU resources to improve training speed and prediction efficiency.
- (5)
- To evaluate our model, we conducted experiments employing the Kaggle competition’s provided sales dataset. The results demonstrate remarkable performance compared to seven selected benchmarks, namely ARMA (Autoregressive Moving Average), ARIMA (Autoregressive Integrated Moving Average), SARIMAX (Seasonal Autoregressive Integrated Moving Average), SVR (Support Vector Regression), RNN, GRU (Gated Recurrent Units), and LSTM. The proposed model achieved substantial average improvement rates of approximately 48.2%, 48.5%, 45.2, and 63.0% across four evaluation metrics: RMSLE (Root Mean Squared Logarithmic Error), RMSWLE (Root Mean Squared Weighted Logarithmic Error), NWRMSLE (Normalized Weighted Root Mean Squared Logarithmic Error), and RMALE (Root Mean Absolute Logarithmic Error). These metrics serve as a robust indicator of enhanced sales forecasting. Furthermore, we performed ablation experiments to investigate the impact of the multi-head attention mechanism and encoder–decoder count. These experiments unequivocally demonstrate the significant superiority of our proposed model over existing methodologies.
2. Related Works
3. Model Design
3.1. Model Architecture
3.2. Embedding Layer
3.3. Positional Encoding
3.4. Encoder
3.4.1. Multi-Head Attention
3.4.2. Feedforward Sub-Layer
3.4.3. Add and Norm Sub-Layer
3.5. Decoder
3.5.1. Masked Multi-Head Attention
3.5.2. Encoder–Decoder Attention
4. Experiments
4.1. Prediction Steps
4.2. Datasets and Benchmarks
4.3. Experimental Settings and Model Assessment
4.4. Model Training
4.5. Experimental Results
4.6. Ablation Experiment
4.6.1. Effect of Multi-Head Attention
4.6.2. Effect of the Number of Encoder–Decoders
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krishna, A.; Akhilesh, V.; Aich, A.; Hegde, C. Sales-forecasting of retail stores using machine learning techniques. In Proceedings of the 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 20–22 December 2018; pp. 160–166. [Google Scholar]
- Gould, P.G.; Koehler, A.B.; Ord, J.K.; Snyder, R.D.; Hyndman, R.J.; Vahid-Araghi, F. Forecasting time series with multiple seasonal patterns. Eur. J. Oper. Res. 2008, 191, 207–222. [Google Scholar] [CrossRef]
- Yan, Y.; Jiang, J.; Yang, H. Mandarin prosody boundary prediction based on sequence-to-sequence model. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; pp. 1013–1017. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
- Xie, H.H.; Li, C.; Ding, N.; Gong, C. Walmart Sale Forecasting Model Based On LSTM And LightGBM. In Proceedings of the 2021 2nd International Conference on Education, Knowledge and Information Management (ICEKIM), Xiamen, China, 29–31 January 2021; pp. 366–369. [Google Scholar]
- Joshuva, A.; Sugumaran, V. A machine learning approach for condition monitoring of wind turbine blade using autoregressive moving average (ARMA) features through vibration signals: A comparative study. Prog. Ind. Ecol. Int. J. 2018, 12, 14–34. [Google Scholar] [CrossRef]
- Efat, M.I.A.; Hajek, P.; Abedin, M.Z.; Azad, R.U.; Jaber, M.A.; Aditya, S.; Hassan, M.K. Deep-learning model using hybrid adaptive trend estimated series for modelling and forecasting sales. Ann. Oper. Res. 2022, 1–32. [Google Scholar] [CrossRef]
- Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [Google Scholar]
- Choi, T.-M.; Yu, Y.; Au, K.-F. A hybrid SARIMA wavelet transform method for sales forecasting. Decis. Support Syst. 2011, 51, 130–140. [Google Scholar] [CrossRef]
- Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
- Box, G.E.; Pierce, D.A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
- Kharfan, M.; Chan, V.W.K.; Firdolas Efendigil, T. A data-driven forecasting approach for newly launched seasonal products by leveraging machine-learning approaches. Ann. Oper. Res. 2021, 303, 159–174. [Google Scholar] [CrossRef]
- Kadam, V.; Vhatkar, S. Design and Develop Data Analysis and Forecasting of the Sales Using Machine Learning. In Intelligent Computing and Networking: Proceedings of IC-ICN 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 157–171. [Google Scholar]
- Benvenuto, D.; Giovanetti, M.; Vassallo, L.; Angeletti, S.; Ciccozzi, M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief 2020, 29, 105340. [Google Scholar] [CrossRef]
- Ampountolas, A. Modeling and forecasting daily hotel demand: A comparison based on sarimax, neural networks, and garch models. Forecasting 2021, 3, 580–595. [Google Scholar] [CrossRef]
- Montero-Manso, P.; Hyndman, R.J. Principles and algorithms for forecasting groups of time series: Locality and globality. Int. J. Forecast. 2021, 37, 1632–1653. [Google Scholar] [CrossRef]
- Berry, L.R.; Helman, P.; West, M. Probabilistic forecasting of heterogeneous consumer transaction–sales time series. Int. J. Forecast. 2020, 36, 552–569. [Google Scholar] [CrossRef]
- Ni, Y.; Fan, F. A two-stage dynamic sales forecasting model for the fashion retail. Expert Syst. Appl. 2011, 38, 1529–1536. [Google Scholar] [CrossRef]
- Junaeti, E.; Wirantika, R. Implementation of Automatic Clustering Algorithm and Fuzzy Time Series in Motorcycle Sales Forecasting. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2018; p. 012126. [Google Scholar]
- YÜCESAN, M. Forecasting Monthly Sales of White Goods Using Hybrid Arimax and Ann Models. Atatürk Üniversitesi Sos. Bilim. Enstitüsü Derg. 2018, 22, 2603–2617. [Google Scholar]
- Parbat, D.; Chakraborty, M. A python based support vector regression model for prediction of COVID-19 cases in India. Chaos Solitons Fractals 2020, 138, 109942. [Google Scholar] [CrossRef]
- Hong, L.; Lamberson, P.; Page, S.E. Hybrid predictive ensembles: Synergies between human and computational forecasts. J. Soc. Comput. 2021, 2, 89–102. [Google Scholar] [CrossRef]
- Zhang, S.; Abdel-Aty, M.; Wu, Y.; Zheng, O. Modeling pedestrians’ near-accident events at signalized intersections using gated recurrent unit (GRU). Accid. Anal. Prev. 2020, 148, 105844. [Google Scholar] [CrossRef]
- Ma, S.; Fildes, R. Retail sales forecasting with meta-learning. Eur. J. Oper. Res. 2021, 288, 111–128. [Google Scholar] [CrossRef]
- Zhao, K.; Wang, C. Sales forecast in e-commerce using convolutional neural network. arXiv 2017, arXiv:1708.07946. [Google Scholar]
- Pham, D.-H.; Le, A.-C. Learning multiple layers of knowledge representation for aspect based sentiment analysis. Data Knowl. Eng. 2018, 114, 26–39. [Google Scholar] [CrossRef]
- Pan, H.; Zhou, H. Study on convolutional neural network and its application in data mining and sales forecasting for E-commerce. Electron. Commer. Res. 2020, 20, 297–320. [Google Scholar] [CrossRef]
- Shih, Y.-S.; Lin, M.-H. A LSTM Approach for Sales Forecasting of Goods with Short-Term Demands in E-Commerce. In Proceedings of the Intelligent Information and Database Systems: 11th Asian Conference, ACIIDS 2019, Yogyakarta, Indonesia, 8–11 April 2019; pp. 244–256. [Google Scholar]
- Wong, W.K.; Guo, Z.X. A hybrid intelligent model for medium-term sales forecasting in fashion retail supply chains using extreme learning machine and harmony search algorithm. Int. J. Prod. Econ. 2010, 128, 614–624. [Google Scholar] [CrossRef]
- Qi, Y.; Li, C.; Deng, H.; Cai, M.; Qi, Y.; Deng, Y. A deep neural framework for sales forecasting in e-commerce. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 299–308. [Google Scholar]
- Xin, S.; Ester, M.; Bu, J.; Yao, C.; Li, Z.; Zhou, X.; Ye, Y.; Wang, C. Multi-task based sales predictions for online promotions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2823–2831. [Google Scholar]
- Eachempati, P.; Srivastava, P.R.; Kumar, A.; Tan, K.H.; Gupta, S. Validating the impact of accounting disclosures on stock market: A deep neural network approach. Technol. Forecast. Soc. Chang. 2021, 170, 120903. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Qi, X.; Hou, K.; Liu, T.; Yu, Z.; Hu, S.; Ou, W. From known to unknown: Knowledge-guided transformer for time-series sales forecasting in Alibaba. arXiv 2021, arXiv:2109.08381. [Google Scholar]
- Rao, Z.; Zhang, Y. Transformer-based power system energy prediction model. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 913–917. [Google Scholar]
- Vallés-Pérez, I.; Soria-Olivas, E.; Martínez-Sober, M.; Serrano-López, A.J.; Gómez-Sanchís, J.; Mateo, F. Approaching sales forecasting using recurrent neural networks and transformers. Expert Syst. Appl. 2022, 201, 116993. [Google Scholar] [CrossRef]
- Yoo, J.; Kang, U. Attention-based autoregression for accurate and efficient multivariate time series forecasting. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), SIAM, Virtual, 29 April–1 May 2021; pp. 531–539. [Google Scholar]
- Yang, Y.; Lu, J. Foreformer: An enhanced transformer-based framework for multivariate time series forecasting. Appl. Intell. 2023, 53, 12521–12540. [Google Scholar] [CrossRef]
- Papadopoulos, S.-I.; Koutlis, C.; Papadopoulos, S.; Kompatsiaris, I. Multimodal Quasi-AutoRegression: Forecasting the visual popularity of new fashion products. Int. J. Multimed. Inf. Retr. 2022, 11, 717–729. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
- Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Virtual, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
- Wright, D.J. Decision support oriented sales forecasting methods. J. Acad. Mark. Sci. 1988, 16, 71–78. [Google Scholar] [CrossRef]
Number | Model | RMSLE | RMSWLE | NWRMSLE | RMALE |
---|---|---|---|---|---|
1 | ARMA | 1.4302 ± 0.0031 | 1.4795 ± 0.0031 | 1.2708 ± 0.0032 | 1.2691 ± 0.0035 |
2 | ARIMA | 1.2759 ± 0.0042 | 1.3016 ± 0.0042 | 1.1662 ± 0.0041 | 1.0733 ± 0.0043 |
3 | SARIMAX | 1.0556 ± 0.0106 | 1.0701 ± 0.0106 | 0.9581 ± 0.0097 | 0.9042 ± 0.0109 |
4 | SVR | 0.9019 ± 0.0028 | 0.9145 ± 0.0028 | 0.8513 ± 0.0028 | 0.8820 ± 0.0030 |
5 | RNN | 0.8537 ± 0.0079 | 0.8592 ± 0.0079 | 0.7066 ± 0.0071 | 0.7455 ± 0.0081 |
6 | GRU | 0.7743 ± 0.0006 | 0.7834 ± 0.0006 | 0.6890 ± 0.0013 | 0.6917 ± 0.0011 |
7 | LSTM | 0.6491 ± 0.0010 | 0.6679 ± 0.0010 | 0.5737 ± 0.0015 | 0.5706 ± 0.0015 |
8 | Proposed model | 0.5136 ± 0.0014 | 0.5204 ± 0.0014 | 0.4862 ± 0.0011 | 0.3245 ± 0.0012 |
Number | Model | RMSLE | RMSWLE | NWRMSLE | RMALE |
---|---|---|---|---|---|
1 | Single-head | 0.8470 ± 0.0016 | 0.8632 ± 0.0016 | 0.7519 ± 0.0017 | 0.7406 ± 0.0018 |
2 | Proposed model (8-head) | 0.5136 ± 0.0014 | 0.5204 ± 0.0014 | 0.4862 ± 0.0011 | 0.3245 ± 0.0012 |
Number of Encoder–Decoders | 2 | 4 | 6 |
---|---|---|---|
RMSLE | 1.0168 ± 0.0043 | 0.6058 ± 0.0092 | 0.5136 ± 0.0014 |
RMSWLE | 1.2182 ± 0.0085 | 0.5253 ± 0.0094 | 0.5204 ± 0.0014 |
NWRMSLE | 0.9828 ± 0.0082 | 0.6185 ± 0.0081 | 0.4862 ± 0.0011 |
RMALE | 0.8251 ± 0.1059 | 0.3382 ± 0.0013 | 0.3245 ± 0.0012 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Q.; Yu, M. Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 1990-2006. https://doi.org/10.3390/jtaer18040100
Li Q, Yu M. Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer. Journal of Theoretical and Applied Electronic Commerce Research. 2023; 18(4):1990-2006. https://doi.org/10.3390/jtaer18040100
Chicago/Turabian StyleLi, Qianying, and Mingyang Yu. 2023. "Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer" Journal of Theoretical and Applied Electronic Commerce Research 18, no. 4: 1990-2006. https://doi.org/10.3390/jtaer18040100
APA StyleLi, Q., & Yu, M. (2023). Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer. Journal of Theoretical and Applied Electronic Commerce Research, 18(4), 1990-2006. https://doi.org/10.3390/jtaer18040100