skip to main content
research-article

Robust explainer recommendation for time series classification

Published: 20 June 2024 Publication History

Abstract

Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques (e.g., LIME, SHAP, CAM) have been proposed and adapted for time series to provide explanation in the form of saliency maps, where the importance of each data point in the time series is quantified with a numerical value. However, the saliency maps can and often disagree, so it is unclear which one to use. This paper provides a novel framework to quantitatively evaluate and rank explanation methods for time series classification. We show how to robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanations side-by-side. The goal is to recommend the best explainer for a given time series classification dataset. We propose AMEE, a Model-Agnostic Explanation Evaluation framework, for recommending saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. Our results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy, which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to recommend the best explainer among a set of different explainers, including random and oracle explainers. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world case study with known expert ground truth.

References

[1]
Abanda A, Mori U, and Lozano J Ad-hoc explanation for time series classification Knowl Based Syst 2022 252
[2]
Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M, Kim B (2018) Sanity checks for saliency maps. Adv Neural Inf Process Syst 31
[3]
Agarwal S, Nguyen TT, Nguyen TL, Ifrim G (2021) Ranking by aggregating referees: evaluating the informativeness of explanation methods for time series classification. In: International workshop on advanced analytics and learning on temporal data, pp 3–20
[4]
Avci A, Bosch S, Marin-Perianu M, Marin-Perianu R, Havinga P (2010) Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey. In: 23th international conference on architecture of computing systems 2010, pp 1–10
[5]
Bagnall A, Lines J, Hills J, Bostrom A (2016) Time-series classification with COTE: the collective of transformation-based ensembles. In: 2016 IEEE 32nd international conference on data engineering, ICDE 2016. DOI 10.1109/ICDE.2016.7498418
[6]
Boniol P, Meftah M, Remy E, Palpanas T (2022) dcam: dimension-wise class activation map for explaining multivariate data series classification. In: Proceedings of the 2022 international conference on management of data, pp 1175–1189
[7]
Bostrom N, Yudkowsky E (2018) The ethics of artificial intelligence. In: Artificial intelligence safety and security. Chapman and Hall/CRC, pp 57–69
[8]
Briandet R, Kemsley E, and Wilson R Discrimination of Arabica and Robusta in instant coffee by Fourier transform infrared spectroscopy and chemometrics J Agric Food Chem 1996 44 1 170-174
[9]
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners Adv Neural Inf Process Syst 2020 33 1877-1901
[10]
Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G (2013) API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning, pp 108–122
[11]
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1721–1730. KDD’15. Association for Computing Machinery, New York, NY, USA. DOI 10.1145/2783258.2788613
[12]
Castro J, Gómez D, and Tejada J Polynomial calculation of the Shapley value based on sampling Comput Oper Res 2009 36 5 1726-1730
[13]
Cover T and Hart P Nearest neighbor pattern classification IEEE Trans Inf Theory 1967 13 1 21-27
[14]
Crabbé J, Van Der Schaar M (2021) Explaining time series predictions with dynamic masks. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning. Proceedings of machine learning research. PMLR, vol 139, pp 2166–2177. https://proceedings.mlr.press/v139/crabbe21a.html
[15]
Dau HA, Bagnall AJ, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh EJ (2018) The UCR time series archive. CoRR, arXiv:1810.07758
[16]
Delaney E, Greene D, Keane MT (2021) Instance-based counterfactual explanations for time series classification. In: International conference on case-based reasoning, pp 32–47
[17]
Dempster A, Petitjean F, and Webb GI ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels Data Min Knowl Disc 2020 34 5 1454-1495
[18]
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
[19]
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
[20]
Frizzarin M, Visentin G, Ferragina A, Hayes E, Bevilacqua A, Dhariyal B, Domijan K, Khan H, Ifrim G, Nguyen TL, Meagher J, Menchetti L, Singh A, Whoriskey S, Williamson R, Zappaterra M, and Casa A Classification of cow diet based on milk mid infrared spectra: a data analysis competition at the “International Workshop on Spectroscopy and Chemometrics 2022” Chemom Intell Lab Syst 2023 234
[21]
Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations, http://arxiv.org/abs/1412.6572
[22]
Guidotti R Evaluating local explanation methods on ground truth Artif Intell 2021 291
[23]
Guidotti R, Monreale A, Spinnato F, Pedreschi D, Giannotti F (2020) Explaining any time series classifier. In: 2020 IEEE second international conference on cognitive machine intelligence (CogMI), pp 167–176. DOI 10.1109/CogMI50398.2020.00029
[24]
Guillemé M, Masson V, Rozé L, Termier A (2019) Agnostic local explanation for time series classification. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), pp 432–439. DOI 10.1109/ICTAI.2019.00067
[25]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
[26]
Hosmer DW Jr, Lemeshow S, and Sturdivant RX Applied logistic regression 2013 New York Wiley
[27]
Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 708–716
[28]
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA (2019a) Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks. Int J Comput Assist Radiol Surg 14(9):1611–1617
[29]
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA (2019b) Deep learning for time series classification: a review. Data Min Knowl Discov.
[30]
Ismail AA, Gunady M, Corrada Bravo H, Feizi S (2020) Benchmarking Deep Learning Interpretability in Time Series Predictions. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33. Curran Associates Inc., pp 6441–6452. https://proceedings.neurips.cc/paper/2020/file/47a3893cc405396a5c30d91320572d6d-Paper.pdf
[31]
Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F, et al (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In: International conference on machine learning, pp 2668–2677
[32]
Kokhlikyan N, Miglani V, Martin M, Wang E, Alsallakh B, Reynolds J, Melnikov A, Kliushkina N, Araya C, Yan S, Reblitz-Richardson O (2020) Captum: a unified and generic model interpretability library for pytorch
[33]
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
[34]
Le Nguyen T, Gsponer S, Ilie I, O’Reilly M, and Ifrim G Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations Data Min Knowl Discov 2019 33 4 1183-1222
[35]
Lin J, Keogh E, Wei L, and Lonardi S Experiencing SAX: a novel symbolic representation of time series Data Min Knowl Disc 2007 15 2 107-144
[36]
Lipton ZC The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery Queue 2018 16 3 31-57
[37]
Lundberg SM and Lee SI Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, and Garnett R A unified approach to interpreting model predictions Advances in neural information processing systems 2017 Red Hook Curran Associates, Inc., 4765-4774
[38]
Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, Liston DE, Low DKW, Newman SF, Kim J, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery Nat Biomed Eng 2018 2 10 749-760
[39]
Middlehurst M, Schäfer P, Bagnall A (2023) Bake off redux: a review and experimental evaluation of recent time series classification algorithms
[40]
Mishra S, Sturm BL, Dixon S (2017) Local interpretable model-agnostic explanations for music content analysis. In: Cunningham SJ, Duan Z, Hu X, Turnbull D (eds) Proceedings of the 18th international society for music information retrieval conference, ISMIR 2017, Suzhou, China, October 23–27, 2017, pp 537–543, https://ismir2017.smcnus.org/wp-content/uploads/2017/10/216_Paper.pdf
[41]
Mujkanovic F, Doskoc V, Schirneck M, Schäfer P, Friedrich T (2020) timeXplain–a framework for explaining the predictions of time series classifiers. CoRR, arxiv:2007.07606
[42]
Nguyen TT, Le Nguyen T, Ifrim G (2020) A model-agnostic approach to quantifying the informativeness of explanation methods for time series classification. In: International workshop on advanced analytics and learning on temporal data, pp 77–94
[43]
Parvatharaju PS, Doddaiah R, Hartvigsen T, Rundensteiner EA (2021) Learning saliency maps to explain deep time series classifiers. In: Proceedings of the 30th ACM international conference on information & knowledge management, pp 1406–1415. CIKM’21. Association for Computing Machinery, New York, NY, USA. DOI 10.1145/3459637.3482446
[44]
Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic time warping averaging of time series allows faster and more accurate classification. In: 2014 IEEE international conference on data mining, pp 470–479
[45]
Ramgopal S, Thome-Souza S, Jackson M, Kadish NE, Fernández IS, Klehm J, Bosl W, Reinsberger C, Schachter S, and Loddenkemper T Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy Epilepsy Behav 2014 37 291-307
[46]
Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: Third workshop on mining temporal and sequential data. Citeseer
[47]
Ribeiro MT, Singh S, Guestrin C (2016) “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. DOI 10.1145/2939672.2939778
[48]
Rooke C, Smith J, Leung KK, Volkovs M, Zuberi S (2021) Temporal dependencies in feature importance for time series predictions. CoRR, arXiv:2107.14317
[49]
Schäfer P, Leser U (2023) WEASEL 2.0-a random dilated dictionary transform for fast, accurate and memory constrained time series classification. arXiv preprint arXiv:2301.10194
[50]
Schlegel U, Arnout H, El-Assady M, Oelke D, Keim DA (2019) Towards a rigorous evaluation of Xai methods on time series. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 4197–4201. DOI 10.1109/ICCVW.2019.00516
[51]
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
[52]
Sivill T, Flach P (2022) Limesegment: meaningful, realistic time series explanations. In: Proceedings of the 25th international conference on artificial intelligence and statistics. PMLR
[53]
Sivill T, Flach P (2022) Limesegment: Meaningful, realistic time series explanations. In: Camps-Valls G, Ruiz FJR, Valera I (eds) Proceedings of the 25th international conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 151. PMLR (28–30), pp 3418–3433. https://proceedings.mlr.press/v151/sivill22a.html
[54]
Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825
[55]
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2015) Striving for simplicity: the all convolutional net
[56]
Štrumbelj E and Kononenko I Explaining prediction models and individual predictions with feature contributions Knowl Inf Syst 2014 41 647-665
[57]
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: International conference on machine learning, pp 3319–3328
[58]
Suresh H, Hunt N, Johnson A, Celi LA, Szolovits P, Ghassemi M (2017) Clinical intervention prediction and understanding with deep neural networks. In: Machine learning for healthcare conference, pp 322–337
[59]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
[60]
Theissler A, Spinnato F, Schlegel U, and Guidotti R Explainable AI for time series classification: a review, taxonomy and research directions IEEE Access 2022 10 100700-100724
[61]
Zhendong W, Isak S, Rami M, and Panagiotis P Carlos S and Torgo L Learning time series counterfactuals via latent space representations Discovery science 2021 Cham Springer 369-384
[62]
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016. IEEE Computer Society, pp 2921–2929. DOI 10.1109/CVPR.2016.319

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 38, Issue 6
Nov 2024
893 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 20 June 2024
Accepted: 27 May 2024
Received: 06 September 2023

Author Tags

  1. Time series classification
  2. Explainable AI
  3. Explanation recommendation
  4. Trustworthy AI

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media