skip to main content
10.1145/2983323.2983784acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections

Model-Based Oversampling for Imbalanced Sequence Classification

Published: 24 October 2016 Publication History


Sequence classification is critical in the data mining communities. It becomes more challenging when the class distribution is imbalanced, which occurs in many real-world applications. Oversampling algorithms try to re-balance the skewed class by generating synthetic data for minority classes, but most of existing oversampling approaches could not consider the temporal structure of sequences, or handle multivariate and long sequences. To address these problems, this paper proposes a novel oversampling algorithm based on the 'generative' models of sequences. In particular, a recurrent neural network was employed to learn the generative mechanics for sequences as representations for the corresponding sequences. These generative models are then utilized to form a kernel to capture the similarity between different sequences. Finally, oversampling is performed in the kernel feature space to generate synthetic data. The proposed approach can handle highly imbalanced sequential data and is robust to noise. The competitiveness of the proposed approach is demonstrated by experiments on both synthetic data and benchmark data, including univariate and multivariate sequences.


S. Wang, L. L. Minku, and X. Yao, "Resampling-based ensemble methods for online class imbalance learning," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1356--1368, 2015.
Y. H. Zhou and Z. H. Zhou, "Large margin distribution learning with cost interval and unlabeled data," IEEE Transactions on Knowledge and Data Engineering, vol. PP, no. 99, pp. 1--1, 2016.
H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263--1284, 2009.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, pp. 321--357, 2002.
Y. Jo, N. Loghmanpour, and C. P. Rosé, "Time series analysis of nursing notes for mortality prediction via a state transition topic model," in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1171--1180, ACM, 2015.
K. H. Brodersen, T. M. Schofield, A. P. Leff, C. S. Ong, E. I. Lomakina, J. M. Buhmann, and K. E. Stephan, "Generative embedding for model-based classification of fmri data," PLoS Comput Biol, vol. 7, no. 6, p. e1002079, 2011.
J.-S. Wu and Z.-H. Zhou, "Sequence-based prediction of microrna-binding residues in proteins using cost-sensitive laplacian support vector machines," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 752--759, 2013.
H. Chen, P. Tino, A. Rodan, and X. Yao, "Learning in the model space for cognitive fault diagnosis," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 124--136, 2014.
Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, "Advances in optimizing recurrent networks," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624--8628, IEEE, 2013.
R. Goroshin, J. Bruna, J. Tompson, D. Eigen, and Y. LeCun, "Unsupervised learning of spatiotemporally coherent metrics," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4086--4093, 2015.
H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-smote: a new over-sampling method in imbalanced data sets learning," in Advances in Intelligent Computing, pp. 878--887, Springer, 2005.
H. He, Y. Bai, E. A. Garcia, and S. Li, "Adasyn: Adaptive synthetic sampling approach for imbalanced learning," in IEEE International Joint Conference on Neural Networks, pp. 1322--1328, IEEE, 2008.
H. Cao, X.-L. Li, D. Y.-K. Woon, and S.-K. Ng, "Integrated oversampling for imbalanced time series classification," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 12, pp. 2809--2822, 2013.
M. Gönen and E. Alpaydın, "Multiple kernel learning algorithms," The Journal of Machine Learning Research, vol. 12, pp. 2211--2268, 2011.
C. Cortes, M. Mohri, and A. Rostamizadeh, "Algorithms for learning kernels based on centered alignment," The Journal of Machine Learning Research, vol. 13, no. 1, pp. 795--828, 2012.
L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257--286, 1989.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems, pp. 3111--3119, 2013.
H. Jaeger, "The "echo state" approach to analysing and training recurrent neural networks-with an erratum note," Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, vol. 148, p. 34, 2001.
H. Chen, F. Tang, P. Tino, and X. Yao, "Model-based kernel for efficient time series analysis," in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 392--400, ACM, 2013.
V. N. Vapnik and V. Vapnik, Statistical learning theory, vol. 1. Wiley New York, 1998.
J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis. Cambridge university press, 2004.
N. ello Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola, "On kernel-target alignment," in Advances in Neural Information Processing Systems, 2001.
A. Rodan and P. Ti\vno, "Simple deterministically constructed cycle reservoirs with regular jumps," Neural computation, vol. 24, no. 7, pp. 1822--1852, 2012.
E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping.," in Sdm, vol. 1, pp. 5--7, SIAM, 2001.
C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1--27:27, 2011. Software available at cjlin/libsvm.
Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, "The ucr time series classification archive," July 2015. eamonn/time_series_data/.
G. E. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 20--29, 2004.

Cited By

View all



Information & Contributors


Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016


Request permissions for this article.

Check for updates

Author Tags

  1. imbalanced learning
  2. model space
  3. oversampling
  4. sequence classification


  • Research-article

Funding Sources


CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Feb 2025

Other Metrics


Cited By

View all

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media